You can try this way:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName('so')\
    .getOrCreate()

sc = spark.sparkContext

map = {'a': 3, 'b': 44}
data = sc.parallelize([(k, v) for k, v in map.items()]).toDF(['A', 'B'])

data.show()

# +---+---+
# |  A|  B|
# +---+---+
# |  a|  3|
# |  b| 44|
# +---+---+
Answer from kites on Stack Overflow
🌐
Apache
downloads.apache.org › spark › docs › 3.0.0 › api › python › pyspark.sql.html
pyspark.sql module — PySpark 3.0.0 documentation
As of Spark 2.0, this is replaced by SparkSession. However, we are keeping the class here for backward compatibility. A SQLContext can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files. ... jsqlContext – An optional JVM Scala ...
🌐
Cloudera Community
community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › m-p › 78093
Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'
January 2, 2024 - I have written a pyspark.sql query as shown below. I would like the query results to be sent to a textfile but I get the error: AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile' Can someone take a look at the code and let me know where I'm going wrong: #%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession def main(): spark = SparkSession.builder.appName('aggs').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True) df.createOrReplaceTempView('sales_info') example8 = spark.sql("""SELECT * FROM sales_info ORDER BY Sales DESC""") example8.saveAsTextFile("juyfd") main() Any help would be appreciated ·
🌐
Microsoft Learn
learn.microsoft.com › en-us › answers › questions › 927325 › trouble-with-spark-code-in-notebook-str-object-has
Trouble with spark code in Notebook, 'str' object has no attribute 'option - Microsoft Q&A
I can't debug this. I copied it from a Databricks video, so maybe it does not transfer over???? import sys from pyspark import SparkContext from pyspark.sql import SparkSession from pyspark.sql.types import * spark = SparkSession \ …
🌐
Dataiku Community
community.dataiku.com › questions & discussions › using dataiku
How to execute SQL in PySpark — Dataiku Community
November 24, 2021 - from dataiku.core.sql import SQLExecutor2 executor = SQLExecutor2(dataset=dataiku.Dataset("dummy_input_table")) max_dt = executor.query_to_df("Select max(process_date) from dummy_input_table") ... # getting this error <class 'AttributeError'>: module 'dataiku.spark' has no attribute 'get_dateframe'
Find elsewhere
🌐
Apache JIRA
issues.apache.org › jira › browse › SPARK-47202
[SPARK-47202] AttributeError: module 'pandas' has no attribute 'Timstamp' - ASF Jira
February 28, 2024 - File "/databricks/spark/python... v is not None else v File "/databricks/spark/python/pyspark/sql/pandas/types.py", line 1032, in convert_timestamp ts = pd.Timstamp(value) File "/databricks/python/lib/python3.10/site-packages/pandas/__init__.py", line 264, in __getattr__ raise AttributeError(f"module 'pandas' has no attribute ...
🌐
Stack Overflow
stackoverflow.com › questions › 78044657 › module-pyspark-sql-functions-has-no-attribute-median-error-while-using-pyspa
python - Module 'pyspark.sql.functions' has no attribute 'median' error while using pyspark on databricks - Stack Overflow
pyspark sql functions don't have a median function before spark 3.4 version. percentile_approx is the closest you can use, and it's not bad. the only deviation i've seen is when the group has odd number of elements.
🌐
Databricks Community
community.databricks.com › t5 › data-engineering › modulenotfounderror-no-module-named-pyspark-dbutils › td-p › 46215
ModuleNotFoundError: No module named 'pyspark.dbut... - Databricks Community - 46215
February 26, 2025 - ModuleNotFoundError: No module named 'pyspark.dbut... ... from pyspark.sql import SparkSession from pyspark.dbutils import DBUtils class DatabricksUtils: def __init__(self‌‌): self.spark = SparkSession.getActiveSession() self.dbutils = DBUtils(self.spark) def get_dbutils(self) -> DBUtils: return self.dbutils
🌐
GitHub
github.com › JohnSnowLabs › spark-nlp › issues › 6506
AttributeError: module 'sparknlp' has no attribute 'version' · Issue #6506 · JohnSnowLabs/spark-nlp
November 22, 2021 - Microsoft Spark Runtime 2021.1 ... updated successfully, but these errors were encountered: ... You must install spark-nlp PyPI package on all the executors....
🌐
Stack Overflow
stackoverflow.com › questions › 65032306 › pyspark-object-has-no-attribute-spark
python - Pyspark-object has no attribute 'spark' - Stack Overflow
I see that within your SetUpClass method you are using cls.spark, you need to declare is as attribute in class UtilsTestCase.
Top answer
1 of 5
6

I had the same error and followed the stack trace.

In my case, I was building an Egg file and then passing it to spark via the --py-files option.

Concerning the error, I think it boils down to the fact that when you call F.udf(str2num, t.IntegerType()) a UserDefinedFunction instance is created before Spark is running, so it has an empty reference to some SparkContext, call it sc. When you run the UDF, sc._pickled_broadcast_vars is referenced and this throws the AttributeError in your output.

My work around is to avoid creating the UDF until Spark is running (and hence there is an active SparkContext. In your case, you could just change your definition of

def letConvNum(df):    # df is a PySpark DataFrame
    #Get a list of columns that I want to transform, using the metadata Pandas DataFrame
    chng_cols=metadta[(metadta.comments=='letter conversion to num')].col_name.tolist()

    str2numUDF = F.udf(str2num, t.IntegerType()) # create UDF on demand
    for curcol in chng_cols:
        df=df.withColumn(curcol, str2numUDF(df[curcol]))
    return df

Note: I haven't actually tested the code above, but the change in my own code was similar and everything worked fine.

Also, for the interested reader, see the Spark code for UserDefinedFunction

2 of 5
6

I think a cleaner solution would be to use the udf decorator to define your udf function :

from pyspark.sql.functions as F

@F.udf
def str2numUDF(text):
    if type(text)==None or text=='' or text=='NULL' or text=='null':
        return 0
    elif len(text)==1:
        return ord(text)
    else:
        newnum=''
        for lettr in text:
            newnum=newnum+str(ord(lettr))
        return int(newnum)

With this solution, the udf does not reference any other function so it won't throw any errors at you.

For some older versions of spark, the decorator doesn't support typed udf some you might have to define a custom decorator as follow :

from pyspark.sql.functions as F
from pyspark.sql.types as t

# Custom udf decorator which accept return type
def udf_typed(returntype=t.StringType()):
    def _typed_udf_wrapper(func):
        return F.udf(func, returntype)
    return _typed_udf_wrapper

@udf_typed(t.IntegerType())
def my_udf(x)
    return int(x)
🌐
Berkeley EECS
people.eecs.berkeley.edu › ~jegonzal › pyspark › pyspark.sql.html
pyspark.sql module — PySpark master documentation
The data source is specified by the source and a set of options. If source is not specified, the default data source configured by spark.sql.sources.default will be used.
🌐
Apache Software Foundation
archive.apache.org › dist › spark › docs › 2.4.5 › api › python › pyspark.sql.html
pyspark.sql module — PySpark 2.4.5 documentation
As of Spark 2.0, this is replaced by SparkSession. However, we are keeping the class here for backward compatibility. A SQLContext can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files. ... jsqlContext – An optional JVM Scala ...
🌐
Databricks Community
community.databricks.com › t5 › data-engineering › attributeerror-module-pyspark-dbutils-has-no-attribute-fs › td-p › 5042
AttributeError: module 'pyspark.dbutils' has no attribute 'fs'
April 29, 2023 - import pyspark.dbutils as pdbutils pdbutils.fs.cp("/dbfs/Data/file1.csv", "/Users/Downloads/") Traceback (most recent call last): File "/databricks/python/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3378, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<command-1495140876495465>", line 4, in <module> pdbutils.fs.cp("/dbfs/Usage-Data/df_tx_0106_2022.csv", "/Users/keval/Downloads/") AttributeError: module 'pyspark.dbutils' has no attribute 'fs'