module 'spark' has no attribute 'sql'

stackoverflow.com › questions › 62706501 › spark-attributeerror-sqlcontext-object-has-no-attribute-createdataframe

pyspark - Spark: AttributeError: 'SQLContext' object has no attribute 'createDataFrame' - Stack Overflow

You can try this way:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName('so')\
    .getOrCreate()

sc = spark.sparkContext

map = {'a': 3, 'b': 44}
data = sc.parallelize([(k, v) for k, v in map.items()]).toDF(['A', 'B'])

data.show()

# +---+---+
# |  A|  B|
# +---+---+
# |  a|  3|
# |  b| 44|
# +---+---+

stackoverflow.com › questions › 32967805 › sqlcontext-object-has-no-attribute-read-while-reading-csv-in-pyspark

python - SQLContext object has no attribute read while reading csv in pyspark - Stack Overflow

downloads.apache.org › spark › docs › 3.0.0 › api › python › pyspark.sql.html

You are trying to use Spark 1.4+ syntax.

For Spark 1.3

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

df = sqlContext.load(source="com.databricks.spark.csv", header="true", path = "cars.csv")
df.select("year", "model").save("newcars.csv", "com.databricks.spark.csv")

Apache

pyspark.sql module — PySpark 3.0.0 documentation

As of Spark 2.0, this is replaced by SparkSession. However, we are keeping the class here for backward compatibility. A SQLContext can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files. ... jsqlContext – An optional JVM Scala ...

stackoverflow.com › questions › 47498105 › spark-sql-select-yields-attributeerror-module-object-has-no-attribute-api

azure - Spark SQL - Select yields AttributeError: 'module' object has no attribute 'api' - Stack Overflow

community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › m-p › 78093

1 of 3

The notebook is using version 0.17.1 of pandas but the autovizwidget depends on a later version of pandas that has the 'api' module. I've been told that this will be resolved in a subsequent release of HDInsight configs.

ssh into the cluster and run the following:

sudo -HE /usr/bin/anaconda/bin/conda install pandas

2 of 3

Had the same issue. I used:

pip install pandas --upgrade --user

via the terminal available in the jupyter notebook.

Cloudera Community

Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

January 2, 2024 - I have written a pyspark.sql query as shown below. I would like the query results to be sent to a textfile but I get the error: AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile' Can someone take a look at the code and let me know where I'm going wrong: #%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession def main(): spark = SparkSession.builder.appName('aggs').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True) df.createOrReplaceTempView('sales_info') example8 = spark.sql("""SELECT * FROM sales_info ORDER BY Sales DESC""") example8.saveAsTextFile("juyfd") main() Any help would be appreciated ·

stackoverflow.com › questions › 34302314 › no-module-name-pyspark-error

python - No module name pyspark error - Stack Overflow

learn.microsoft.com › en-us › answers › questions › 927325 › trouble-with-spark-code-in-notebook-str-object-has

1 of 9

You don't have pyspark installed in a place available to the python installation you're using. To confirm this, on your command line terminal, with your virtualenv activated, enter your REPL (python) and type import pyspark:

$ python
Python 3.5.0 (default, Dec  3 2015, 09:58:14) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'pyspark'

If you see the No module name 'pyspark' ImportError you need to install that library. Quit the REPL and type:

pip install pyspark

Then re-enter the repl to confirm it works:

$ python
Python 3.5.0 (default, Dec  3 2015, 09:58:14) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>>

As a note, it is critical your virtual environment is activated. When in the directory of your virtual environment:

$ source bin/activate

These instructions are for a unix-based machine, and will vary for Windows.

2 of 9

Just use:

import findspark
findspark.init()

import pyspark # only run after findspark.init()

If you don't have findspark module install it with:

python -m pip install findspark

Microsoft Learn

Trouble with spark code in Notebook, 'str' object has no attribute 'option - Microsoft Q&A

I can't debug this. I copied it from a Databricks video, so maybe it does not transfer over???? import sys from pyspark import SparkContext from pyspark.sql import SparkSession from pyspark.sql.types import * spark = SparkSession \ …

stackoverflow.com › questions › 39521341 › pyspark-error-attributeerror-sparksession-object-has-no-attribute-paralleli

python - pyspark error: AttributeError: 'SparkSession' object has no attribute 'parallelize' - Stack Overflow

community.dataiku.com › questions & discussions › using dataiku

1 of 2

SparkSession is not a replacement for a SparkContext but an equivalent of the SQLContext. Just use it use the same way as you used to use SQLContext:

spark.createDataFrame(...)

and if you ever have to access SparkContext use sparkContext attribute:

spark.sparkContext

so if you need SQLContext for backwards compatibility you can:

SQLContext(sparkContext=spark.sparkContext, sparkSession=spark)

2 of 2

Whenever we are trying to create a DF from a backward-compatible object like RDD or a data frame created by spark session, you need to make your SQL context-aware about your session and context.

Like Ex:

If I create a RDD:

ss=SparkSession.builder.appName("vivek").master('local').config("k1","vi").getOrCreate()

rdd=ss.sparkContext.parallelize([('Alex',21),('Bob',44)])

But if we wish to create a df from this RDD, we need to

sq=SQLContext(sparkContext=ss.sparkContext, sparkSession=ss)

then only we can use SQLContext with RDD/DF created by pandas.

schema = StructType([
   StructField("name", StringType(), True),
   StructField("age", IntegerType(), True)])
df=sq.createDataFrame(rdd,schema)
df.collect()

Dataiku Community

How to execute SQL in PySpark — Dataiku Community

November 24, 2021 - from dataiku.core.sql import SQLExecutor2 executor = SQLExecutor2(dataset=dataiku.Dataset("dummy_input_table")) max_dt = executor.query_to_df("Select max(process_date) from dummy_input_table") ... # getting this error <class 'AttributeError'>: module 'dataiku.spark' has no attribute 'get_dateframe'

Find elsewhere

Google Bing Mojeek

stackoverflow.com › questions › 64360149 › attributeerror-module-pyspark-sql-types-has-no-attribute-listtype

python - AttributeError: module 'pyspark.sql.types' has no attribute 'ListType' - Stack Overflow

issues.apache.org › jira › browse › SPARK-47202

ListType is not available in Pyspark. You will need to change to ArrayType, which always needs a defined type of the elements.

func = F.udf(lambda x: calculate(x), T.StructType([
    T.StructField("val0", T.IntegerType(), True),
    T.StructField("val1", T.FloatType(), True),
    T.StructField("val2", T.ArrayType(T.IntegerType()), True),
]))

Also a small thought on my side. I really like the UDF decorator, when developing UDF functions. I really like this approach because it makes the code look much cleaner in my opinion. Your code would look as follows:

returnType=T.StructType([
    T.StructField("val0", T.IntegerType(), True),
    T.StructField("val1", T.FloatType(), True),
    T.StructField("val2", T.ArrayType(T.IntegerType()), True),
])

@F.udf(returnType=returnType)
def calculate(mylist):
  x = mylist[0]
  y = mylist[1]
  list = mylist[-3,-2,-1]
  return x, y, list

Apache JIRA

[SPARK-47202] AttributeError: module 'pandas' has no attribute 'Timstamp' - ASF Jira

February 28, 2024 - File "/databricks/spark/python... v is not None else v File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1032, in convert_timestamp ts = pd.Timstamp(value) File "/databricks/python/lib/python3.10/site-packages/pandas/__init__.py", line 264, in __getattr__ raise AttributeError(f"module 'pandas' has no attribute ...

stackoverflow.com › questions › 78044657 › module-pyspark-sql-functions-has-no-attribute-median-error-while-using-pyspa

python - Module 'pyspark.sql.functions' has no attribute 'median' error while using pyspark on databricks - Stack Overflow

pyspark sql functions don't have a median function before spark 3.4 version. percentile_approx is the closest you can use, and it's not bad. the only deviation i've seen is when the group has odd number of elements.

Databricks Community

community.databricks.com › t5 › data-engineering › modulenotfounderror-no-module-named-pyspark-dbutils › td-p › 46215

ModuleNotFoundError: No module named 'pyspark.dbut... - Databricks Community - 46215

February 26, 2025 - ModuleNotFoundError: No module named 'pyspark.dbut... ... from pyspark.sql import SparkSession from pyspark.dbutils import DBUtils class DatabricksUtils: def __init__(self‌‌): self.spark = SparkSession.getActiveSession() self.dbutils = DBUtils(self.spark) def get_dbutils(self) -> DBUtils: return self.dbutils

GitHub

github.com › JohnSnowLabs › spark-nlp › issues › 6506

AttributeError: module 'sparknlp' has no attribute 'version' · Issue #6506 · JohnSnowLabs/spark-nlp

November 22, 2021 - Microsoft Spark Runtime 2021.1 ... updated successfully, but these errors were encountered: ... You must install spark-nlp PyPI package on all the executors....

stackoverflow.com › questions › 65032306 › pyspark-object-has-no-attribute-spark

python - Pyspark-object has no attribute 'spark' - Stack Overflow

I see that within your SetUpClass method you are using cls.spark, you need to declare is as attribute in class UtilsTestCase.

stackoverflow.com › questions › 35923775 › functions-from-custom-module-not-working-in-pyspark-but-they-work-when-inputted

apache spark sql - Functions from custom module not working in PySpark, but they work when inputted in interactive mode - Stack Overflow

1 of 5

I had the same error and followed the stack trace.

In my case, I was building an Egg file and then passing it to spark via the --py-files option.

Concerning the error, I think it boils down to the fact that when you call F.udf(str2num, t.IntegerType()) a UserDefinedFunction instance is created before Spark is running, so it has an empty reference to some SparkContext, call it sc. When you run the UDF, sc._pickled_broadcast_vars is referenced and this throws the AttributeError in your output.

My work around is to avoid creating the UDF until Spark is running (and hence there is an active SparkContext. In your case, you could just change your definition of

def letConvNum(df):    # df is a PySpark DataFrame
    #Get a list of columns that I want to transform, using the metadata Pandas DataFrame
    chng_cols=metadta[(metadta.comments=='letter conversion to num')].col_name.tolist()

    str2numUDF = F.udf(str2num, t.IntegerType()) # create UDF on demand
    for curcol in chng_cols:
        df=df.withColumn(curcol, str2numUDF(df[curcol]))
    return df

Note: I haven't actually tested the code above, but the change in my own code was similar and everything worked fine.

Also, for the interested reader, see the Spark code for UserDefinedFunction

2 of 5

I think a cleaner solution would be to use the udf decorator to define your udf function :

from pyspark.sql.functions as F

@F.udf
def str2numUDF(text):
    if type(text)==None or text=='' or text=='NULL' or text=='null':
        return 0
    elif len(text)==1:
        return ord(text)
    else:
        newnum=''
        for lettr in text:
            newnum=newnum+str(ord(lettr))
        return int(newnum)

With this solution, the udf does not reference any other function so it won't throw any errors at you.

For some older versions of spark, the decorator doesn't support typed udf some you might have to define a custom decorator as follow :

from pyspark.sql.functions as F
from pyspark.sql.types as t

# Custom udf decorator which accept return type
def udf_typed(returntype=t.StringType()):
    def _typed_udf_wrapper(func):
        return F.udf(func, returntype)
    return _typed_udf_wrapper

@udf_typed(t.IntegerType())
def my_udf(x)
    return int(x)

stackoverflow.com › questions › 40651003 › attributeerror-sparkcontext-object-has-no-attribute-createdataframe-using-s

python - AttributeError: 'SparkContext' object has no attribute 'createDataFrame' using Spark 1.6 - Stack Overflow