The line

spark = SparkSession.builder.master("local").name("test").getOrCreate

assigns the function getOrCreate to the variable spark, which is not what you wanted.

Instead, you want to assign the return value of the function getOrCreate (i.e. a spark session) to the variable spark, so you need to call the function using a pair of empty parentheses:

spark = SparkSession.builder.master("local").name("test").getOrCreate()
Answer from mck on Stack Overflow
🌐
Databricks
kb.databricks.com › python › function-object-no-attribute.html
AttributeError: 'function' object has no attribute
May 19, 2022 - Problem You are selecting columns from a DataFrame and you get an error message. ERROR: AttributeError: 'function' object has no attribute '_get_object_id'
Discussions

python - Spark join throws 'function' object has no attribute '_get_object_id' error. How could I fix it? - Stack Overflow
I am making a query in Spark in Databricks, and I have a problema when I am trying to make a join between two dataframes. The two dataframes that I have are the next ones: "names_df" which has 2 co... More on stackoverflow.com
🌐 stackoverflow.com
AttributeError: 'function' object has no attribute '_jrdd' - Spark Streaming - CloudxLab Discussions
I am trying to create a temporary table using Spark Data Frame from the Kafka Streaming data. So that, I can execute query on the table. My code is as follows. from pyspark import SparkConf, SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils ... More on discuss.cloudxlab.com
🌐 discuss.cloudxlab.com
2
0
December 7, 2018
python - SQLContext object has no attribute read while reading csv in pyspark - Stack Overflow
Communities for your favorite technologies. Explore all Collectives · Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work More on stackoverflow.com
🌐 stackoverflow.com
Trouble with spark code in Notebook, 'str' object has no attribute 'option
I can't debug this. I copied it from a Databricks video, so maybe it does not transfer over???? import sys from pyspark import SparkContext from pyspark.sql import SparkSession from pyspark.sql.types import * spark = SparkSession \ … More on learn.microsoft.com
🌐 learn.microsoft.com
1
0
🌐
Microsoft Learn
learn.microsoft.com › en-us › answers › questions › 778476 › function-that-works-in-python-but-sql-doesnt-pyspa
Function that works in python but sql doesn't pyspark - Microsoft Q&A
March 18, 2022 - Function that works in python but sql doesn't pyspark · I have successfully performed a function, and I have used it without problems, python, I have seen in videos and I have read that it can be implemented in both contexts, but I cannot do it myself · Here is an image where I use it in python and then in another cell I use it in sql, and the error message appears · Function applied · Error message PythonException: 'AttributeError: 'NoneType' object has no attribute '_tools'' but if the geocode function works, it is because it is registered, I installed arcgis, and then I did this ·
🌐
Cloudera Community
community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › m-p › 78093
Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'
January 2, 2024 - AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile' Can someone take a look at the code and let me know where I'm going wrong: #%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession def main(): spark = SparkSession.builder.appName('aggs').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True) df.createOrReplaceTempView('sales_info') example8 = spark.sql("""SELECT * FROM sales_info ORDER BY Sales DESC""") example8.saveAsTextFile("juyfd") main()
🌐
Itversity
discuss.itversity.com › t › pyspark-data-frame-api-attributeerror-str-object-has-no-attribute-desc › 11002
ITVersity - Learn AI, Data Science & Software Engineering
December 27, 2020 - Master the latest skills in AI, Data Science, and Software Engineering with ITVersity. Choose from self-paced online courses, live 10-week cohorts, webinars, and community meetups.
🌐
Cloudxlab
discuss.cloudxlab.com › technical discussions › spark streaming
AttributeError: 'function' object has no attribute '_jrdd' - Spark Streaming - CloudxLab Discussions
December 7, 2018 - I am trying to create a temporary table using Spark Data Frame from the Kafka Streaming data. So that, I can execute query on the table. My code is as follows. from pyspark import SparkConf, SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils from pyspark.sql import SQLContext from pyspark.sql import Row from operator import add def getSqlContextInstance(sparkContext): if (‘sqlContextSingletonInstance’ not in globals()): globals()...
Find elsewhere
🌐
Spark By {Examples}
sparkbyexamples.com › home › hbase › attributeerror: ‘dataframe’ object has no attribute ‘map’ in pyspark
AttributeError: 'DataFrame' object has no attribute 'map' in PySpark - Spark By {Examples}
March 27, 2024 - df2=df.map(lambda x: [x[0],x[1]]) File "C:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\dataframe.py", line 1401, in __getattr__ "'%s' object has no attribute '%s'" % (self.__class__.__name__, name)) AttributeError: 'DataFrame' object has no attribute 'map'
🌐
Microsoft Learn
learn.microsoft.com › en-us › answers › questions › 927325 › trouble-with-spark-code-in-notebook-str-object-has
Trouble with spark code in Notebook, 'str' object has no attribute 'option - Microsoft Q&A
AttributeError Traceback (most recent call last) /tmp/ipykernel_7707/2196056541.py in <module> 21 22 df = spark.read\ ---> 23 .format='csv' \ 24 .option("badRecordsPath", 'abfss://*****@synapseqadatalakegen2.dfs.core.windows.net/DataLakehouse/CSV/BadCSV/.csv')\ 25 .option("mode", "PERMISSIVE")\ AttributeError: 'str' object has no attribute 'option'
🌐
AWS re:Post
repost.aws › questions › QUvWrsRjenSrqHLJqLpy4DWg › attributeerror-dataframe-object-has-no-attribute-get-object-id
AttributeError: 'DataFrame' object has no attribute '_get_object_id' | AWS re:Post
October 11, 2018 - AttributeError: 'DataFrame' object has no attribute '_get_object_id' when I run the script. I'm pretty confident the error is occurring during this line: datasink = glueContext.write_dynamic_frame.from_catalog(frame = source_dynamic_frame, database = target_database, table_name = target_table_name, transformation_ctx = "datasink") but I can't decipher what it's trying to tell me. Can anyone please help me out or point me in the right direction? Thanks! %pyspark import sys from pyspark.context import SparkContext from pyspark.sql.functions import lit, current_timestamp from pyspark.sql.window i
🌐
Cumulative Sum
cumsum.wordpress.com › 2020 › 10 › 10 › pyspark-attributeerror-dataframe-object-has-no-attribute-_get_object_id
[pyspark] AttributeError: ‘DataFrame’ object has no attribute ‘_get_object_id’
October 10, 2020 - Consider the following two data frames, and you want to filter df by id with df2: df = spark.createDataFrame([[1, 2, 3], [2, 3, 4], [4, 5, 6]], ['id', 'a', 'b']) df2 = spark.createDataFrame([[1], [2]], ['id']) df.show() +---+---+---+ | id| a| ...
🌐
Hail Discussion
discuss.hail.is › help [0.1]
AttributeError: 'DataFrame' object has no attribute 'to_spark' - Help [0.1] - Hail Discussion
July 22, 2018 - I am trying to covert a Hail table to a pandas dataframe: kk2 = hl.Table.to_pandas(table1) # convert to pandas I am not sure why I am getting this error: --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in 1 kk2 = hl.Table.to_pandas(table1) # convert to pandas /home/hail/hail.zip/hail/typecheck/check.py in wrapper(*args, **kwargs) 545 ...
🌐
GitHub
github.com › databricks › spark-sklearn › issues › 59
AttributeError: 'function' object has no attribute '_input_kwargs' · Issue #59 · databricks/spark-sklearn
August 22, 2017 - I get the following error: Traceback (most recent call last): File "test.py", line 19, in km = KeyedEstimator(sklearnEstimator=LinearRegression(), yCol="y").fit(df) File "C:\spark-2.2.0-bin-hadoop2.7\python\pyspark\__init__.py", line 104, in wrapper return func(self, **kwargs) File "C:\Python27\lib\site-packages\spark_sklearn\keyed_models.py", line 323, in __init__ kwargs = KeyedEstimator._inferredParams(sklearnEstimator, self.__init__._input_kwargs) AttributeError: 'function' object has no attribute '_input_kwargs'
Author   sounakban
🌐
Reddit
reddit.com › r/pyspark › attributeerror: 'builtin_function_or_method' object has no attribute
r/PySpark on Reddit: AttributeError: 'builtin_function_or_method' object has no attribute
August 5, 2018 -

Hello community,

I am trying to collect and send the results from a pyspark query to a textfile.

However, I keep on getting the error:

AttributeError: 'builtin_function_or_method' object has no attribute example8

I'm extremely new to pyspark.sql. The code is as follows:

#%%

import sys

from operator import add

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('aggs').getOrCreate()

df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True)

example8 = spark.sql("""SELECT

*

FROM sales_info

ORDER BY Sales DESC""")

print.example8.collect()

example8.saveAsTextFile("/home/packt/test.txt")

read_rdd = sc.textFile("/home/packt/test.txt")

read_rdd.collect()

main()

The full error message is as follows:

Append ResultsClear Results

--------------------------------------------------------------------------- AttributeError                            Traceback (most recent call last) <ipython-input-42-714a9bbd2b92> in <module>()      74 FROM sales_info      75 ORDER BY Sales DESC""") ---> 76 print.example8.collect()      77       78 example8.saveAsTextFile("/home/packt/test.txt") AttributeError: 'builtin_function_or_method' object has no attribute 'example8'

Any help figuring out the error will be greatly appreciated.

Thanks

Top answer
1 of 2
7

The syntax you are using is for a pandas DataFrame. To achieve this for a spark DataFrame, you should use the withColumn() method. This works great for a wide range of well defined DataFrame functions, but it's a little more complicated for user defined mapping functions.

General Case

In order to define a udf, you need to specify the output data type. For instance, if you wanted to apply a function my_func that returned a string, you could create a udf as follows:

import pyspark.sql.functions as f
my_udf = f.udf(my_func, StringType())

Then you can use my_udf to create a new column like:

df = df.withColumn('new_column', my_udf(f.col("some_column_name")))

Another option is to use select:

df = df.select("*", my_udf(f.col("some_column_name")).alias("new_column"))

Specific Problem

Using a udf

In your specific case, you want to use a dictionary to translate the values of your DataFrame.

Here is a way to define a udf for this purpose:

some_map_udf = f.udf(lambda x: some_map.get(x, None), IntegerType())

Notice that I used dict.get() because you want your udf to be robust to bad inputs.

df = df.withColumn('new_column', some_map_udf(f.col("some_column_name")))

Using DataFrame functions

Sometimes using a udf is unavoidable, but whenever possible, using DataFrame functions is usually preferred.

Here is one option to do the same thing without using the udf.

The trick is to iterate over the items in some_map to create a list of pyspark.sql.functions.when() functions.

some_map_func = [f.when(f.col("some_column_name") == k, v) for k, v in some_map.items()]
print(some_map_func)
#[Column<CASE WHEN (some_column_name = a) THEN 0 END>,
# Column<CASE WHEN (some_column_name = c) THEN 1 END>,
# Column<CASE WHEN (some_column_name = b) THEN 1 END>]

Now you can use pyspark.sql.functions.coalesce() inside of a select:

df = df.select("*", f.coalesce(*some_map_func).alias("some_column_name"))

This works because when() returns null by default if the condition is not met, and coalesce() will pick the first non-null value it encounters. Since the keys of the map are unique, at most one column will be non-null.

2 of 2
1

You have a spark dataframe, not a pandas dataframe. To add new column to the spark dataframe:

import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType
df = df.withColumn('new_column', F.udf(some_map.get, IntegerType())(some_column_name))
df.show()
🌐
GitHub
github.com › microsoft › FLAML › issues › 625
AttributeError: 'DataFrame' object has no attribute 'copy' · Issue #625 · microsoft/FLAML
July 2, 2022 - train = spark.read.parquet("./train.parquet") test = spark.read.parquet("./test.parquet") input_cols = [c for c in train.columns if c != 'target'] vectorAssembler = VectorAssembler(inputCols = input_cols, outputCol = 'features') vectorAssembler.setHandleInvalid("skip").transform(train).show train_sprk = vectorAssembler.transform(train) test_sprk = vectorAssembler.transform(test) from sklearn.model_selection import train_test_split from sklearn.datasets import make_classification y = train_sprk["target"] X = train_sprk[input_cols] X, y = make_classification() X_train, X_test, y_train, y_test =
Author   Shafi2016
🌐
Databricks Community
community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute › td-p › 61132
AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 61132
February 19, 2024 - Hello, I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message: "AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'" Here is part of the code: @dl...