function object has no attribute read pyspark

: 'function' object has no attribute 'read' on jupyter [closed]

stackoverflow.com › questions › 65375537 › function-object-has-no-attribute-read-on-jupyter

The line

spark = SparkSession.builder.master("local").name("test").getOrCreate

assigns the function getOrCreate to the variable spark, which is not what you wanted.

Instead, you want to assign the return value of the function getOrCreate (i.e. a spark session) to the variable spark, so you need to call the function using a pair of empty parentheses:

spark = SparkSession.builder.master("local").name("test").getOrCreate()

Answer from mck on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 65375537 › function-object-has-no-attribute-read-on-jupyter

python - : 'function' object has no attribute 'read' on jupyter - Stack Overflow

Top answer

1 of 1

The line

spark = SparkSession.builder.master("local").name("test").getOrCreate

assigns the function getOrCreate to the variable spark, which is not what you wanted.

Instead, you want to assign the return value of the function getOrCreate (i.e. a spark session) to the variable spark, so you need to call the function using a pair of empty parentheses:

spark = SparkSession.builder.master("local").name("test").getOrCreate()

Databricks

kb.databricks.com › python › function-object-no-attribute.html

AttributeError: 'function' object has no attribute

May 19, 2022 - Problem You are selecting columns from a DataFrame and you get an error message. ERROR: AttributeError: 'function' object has no attribute '_get_object_id'

Discussions

python - Spark join throws 'function' object has no attribute '_get_object_id' error. How could I fix it? - Stack Overflow

I am making a query in Spark in Databricks, and I have a problema when I am trying to make a join between two dataframes. The two dataframes that I have are the next ones: "names_df" which has 2 co... More on stackoverflow.com

stackoverflow.com

AttributeError: 'function' object has no attribute '_jrdd' - Spark Streaming - CloudxLab Discussions

I am trying to create a temporary table using Spark Data Frame from the Kafka Streaming data. So that, I can execute query on the table. My code is as follows. from pyspark import SparkConf, SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils ... More on discuss.cloudxlab.com

discuss.cloudxlab.com

December 7, 2018

python - SQLContext object has no attribute read while reading csv in pyspark - Stack Overflow

Communities for your favorite technologies. Explore all Collectives · Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work More on stackoverflow.com

stackoverflow.com

Trouble with spark code in Notebook, 'str' object has no attribute 'option

I can't debug this. I copied it from a Databricks video, so maybe it does not transfer over???? import sys from pyspark import SparkContext from pyspark.sql import SparkSession from pyspark.sql.types import * spark = SparkSession \ … More on learn.microsoft.com

learn.microsoft.com

Microsoft Learn

learn.microsoft.com › en-us › answers › questions › 778476 › function-that-works-in-python-but-sql-doesnt-pyspa

Function that works in python but sql doesn't pyspark - Microsoft Q&A

March 18, 2022 - Function that works in python but sql doesn't pyspark · I have successfully performed a function, and I have used it without problems, python, I have seen in videos and I have read that it can be implemented in both contexts, but I cannot do it myself · Here is an image where I use it in python and then in another cell I use it in sql, and the error message appears · Function applied · Error message PythonException: 'AttributeError: 'NoneType' object has no attribute '_tools'' but if the geocode function works, it is because it is registered, I installed arcgis, and then I did this ·

Cloudera Community

community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › m-p › 78093

Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

January 2, 2024 - AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile' Can someone take a look at the code and let me know where I'm going wrong: #%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession def main(): spark = SparkSession.builder.appName('aggs').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True) df.createOrReplaceTempView('sales_info') example8 = spark.sql("""SELECT * FROM sales_info ORDER BY Sales DESC""") example8.saveAsTextFile("juyfd") main()

Stack Overflow

stackoverflow.com › questions › 39364283 › spark-join-throws-function-object-has-no-attribute-get-object-id-error-how

python - Spark join throws 'function' object has no attribute '_get_object_id' error. How could I fix it? - Stack Overflow

Top answer

1 of 2

Adding comment as answer since it solved the problem. count is somewhat of a protected keyword in DataFrame API, so naming columns count is dangerous. In your case you could circumvent the error by not using the dot notation, but bracket based column access, e.g.

info["count"]

2 of 2

Try to get info.count as a function call info.count().

movie_names_df = info.join(movies_df, info.movieId == movies_df.ID, "inner").select(movies_df.title, info.average, info.movieId, info.count()).show()

Itversity

discuss.itversity.com › t › pyspark-data-frame-api-attributeerror-str-object-has-no-attribute-desc › 11002

ITVersity - Learn AI, Data Science & Software Engineering

December 27, 2020 - Master the latest skills in AI, Data Science, and Software Engineering with ITVersity. Choose from self-paced online courses, live 10-week cohorts, webinars, and community meetups.

Cloudxlab

discuss.cloudxlab.com › technical discussions › spark streaming

AttributeError: 'function' object has no attribute '_jrdd' - Spark Streaming - CloudxLab Discussions

December 7, 2018 - I am trying to create a temporary table using Spark Data Frame from the Kafka Streaming data. So that, I can execute query on the table. My code is as follows. from pyspark import SparkConf, SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils from pyspark.sql import SQLContext from pyspark.sql import Row from operator import add def getSqlContextInstance(sparkContext): if (‘sqlContextSingletonInstance’ not in globals()): globals()...

Stack Overflow

stackoverflow.com › questions › 32967805 › sqlcontext-object-has-no-attribute-read-while-reading-csv-in-pyspark

python - SQLContext object has no attribute read while reading csv in pyspark - Stack Overflow

Top answer

1 of 1

You are trying to use Spark 1.4+ syntax.

For Spark 1.3

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

df = sqlContext.load(source="com.databricks.spark.csv", header="true", path = "cars.csv")
df.select("year", "model").save("newcars.csv", "com.databricks.spark.csv")

Find elsewhere

Google Bing Mojeek

Spark By {Examples}

sparkbyexamples.com › home › hbase › attributeerror: ‘dataframe’ object has no attribute ‘map’ in pyspark

AttributeError: 'DataFrame' object has no attribute 'map' in PySpark - Spark By {Examples}

March 27, 2024 - df2=df.map(lambda x: [x[0],x[1]]) File "C:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\dataframe.py", line 1401, in __getattr__ "'%s' object has no attribute '%s'" % (self.__class__.__name__, name)) AttributeError: 'DataFrame' object has no attribute 'map'

Microsoft Learn

learn.microsoft.com › en-us › answers › questions › 927325 › trouble-with-spark-code-in-notebook-str-object-has

Trouble with spark code in Notebook, 'str' object has no attribute 'option - Microsoft Q&A

AttributeError Traceback (most recent call last) /tmp/ipykernel_7707/2196056541.py in <module> 21 22 df = spark.read\ ---> 23 .format='csv' \ 24 .option("badRecordsPath", 'abfss://*****@synapseqadatalakegen2.dfs.core.windows.net/DataLakehouse/CSV/BadCSV/.csv')\ 25 .option("mode", "PERMISSIVE")\ AttributeError: 'str' object has no attribute 'option'

Stack Overflow

stackoverflow.com › questions › 57363618 › pyspark-dataframe-object-has-no-attribute-get-object-id

python - pyspark 'DataFrame' object has no attribute '_get_object_id' - Stack Overflow

Top answer

1 of 2

You can't reference a second spark DataFrame inside a function, unless you're using a join. IIUC, you can do the following to achieve your desired result.

Suppose that means is the following:

#means.show()
#+---+---------+
#| id|avg(col1)|
#+---+---------+
#|  1|     12.0|
#|  3|    300.0|
#|  2|     21.0|
#+---+---------+

Join df and means on the id column, then apply your when condition

from pyspark.sql.functions import when

df.join(means, on="id")\
    .withColumn(
        "col1",
        when(
            (df["col1"].isNull()), 
            means["avg(col1)"]
        ).otherwise(df["col1"])
    )\
    .select(*df.columns)\
    .show()
#+---+-----+
#| id| col1|
#+---+-----+
#|  1| 12.0|
#|  1| 12.0|
#|  1| 14.0|
#|  1| 10.0|
#|  3|300.0|
#|  3|300.0|
#|  2| 21.0|
#|  2| 22.0|
#|  2| 20.0|
#+---+-----+

But in this case, I'd actually recommend using a Window with pyspark.sql.functions.mean:

from pyspark.sql import Window
from pyspark.sql.functions import col, mean

df.withColumn(
    "col1",
    when(
        col("col1").isNull(), 
        mean("col1").over(Window.partitionBy("id"))
    ).otherwise(col("col1"))
).show()
#+---+-----+
#| id| col1|
#+---+-----+
#|  1| 12.0|
#|  1| 10.0|
#|  1| 12.0|
#|  1| 14.0|
#|  3|300.0|
#|  3|300.0|
#|  2| 22.0|
#|  2| 20.0|
#|  2| 21.0|
#+---+-----+

2 of 2

-5

I think you are using Scala API, in which you use (). In PySpark, use [] instead.

AWS re:Post

repost.aws › questions › QUvWrsRjenSrqHLJqLpy4DWg › attributeerror-dataframe-object-has-no-attribute-get-object-id

AttributeError: 'DataFrame' object has no attribute '_get_object_id' | AWS re:Post

October 11, 2018 - AttributeError: 'DataFrame' object has no attribute '_get_object_id' when I run the script. I'm pretty confident the error is occurring during this line: datasink = glueContext.write_dynamic_frame.from_catalog(frame = source_dynamic_frame, database = target_database, table_name = target_table_name, transformation_ctx = "datasink") but I can't decipher what it's trying to tell me. Can anyone please help me out or point me in the right direction? Thanks! %pyspark import sys from pyspark.context import SparkContext from pyspark.sql.functions import lit, current_timestamp from pyspark.sql.window i

Cumulative Sum

cumsum.wordpress.com › 2020 › 10 › 10 › pyspark-attributeerror-dataframe-object-has-no-attribute-_get_object_id

[pyspark] AttributeError: ‘DataFrame’ object has no attribute ‘_get_object_id’

October 10, 2020 - Consider the following two data frames, and you want to filter df by id with df2: df = spark.createDataFrame([[1, 2, 3], [2, 3, 4], [4, 5, 6]], ['id', 'a', 'b']) df2 = spark.createDataFrame([[1], [2]], ['id']) df.show() +---+---+---+ | id| a| ...

Hail Discussion

discuss.hail.is › help [0.1]

AttributeError: 'DataFrame' object has no attribute 'to_spark' - Help [0.1] - Hail Discussion

July 22, 2018 - I am trying to covert a Hail table to a pandas dataframe: kk2 = hl.Table.to_pandas(table1) # convert to pandas I am not sure why I am getting this error: --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in 1 kk2 = hl.Table.to_pandas(table1) # convert to pandas /home/hail/hail.zip/hail/typecheck/check.py in wrapper(*args, **kwargs) 545 ...

GitHub

github.com › databricks › spark-sklearn › issues › 59

AttributeError: 'function' object has no attribute '_input_kwargs' · Issue #59 · databricks/spark-sklearn

August 22, 2017 - I get the following error: Traceback (most recent call last): File "test.py", line 19, in km = KeyedEstimator(sklearnEstimator=LinearRegression(), yCol="y").fit(df) File "C:\spark-2.2.0-bin-hadoop2.7\python\pyspark\__init__.py", line 104, in wrapper return func(self, **kwargs) File "C:\Python27\lib\site-packages\spark_sklearn\keyed_models.py", line 323, in __init__ kwargs = KeyedEstimator._inferredParams(sklearnEstimator, self.__init__._input_kwargs) AttributeError: 'function' object has no attribute '_input_kwargs'

Author sounakban

reddit.com › r/pyspark › attributeerror: 'builtin_function_or_method' object has no attribute

r/PySpark on Reddit: AttributeError: 'builtin_function_or_method' object has no attribute

August 5, 2018 -

Hello community,

I am trying to collect and send the results from a pyspark query to a textfile.

However, I keep on getting the error:

AttributeError: 'builtin_function_or_method' object has no attribute example8

I'm extremely new to pyspark.sql. The code is as follows:

#%%

import sys

from operator import add

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('aggs').getOrCreate()

df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True)

example8 = spark.sql("""SELECT

*

FROM sales_info

ORDER BY Sales DESC""")

print.example8.collect()

example8.saveAsTextFile("/home/packt/test.txt")

read_rdd = sc.textFile("/home/packt/test.txt")

read_rdd.collect()

main()

The full error message is as follows:

Append ResultsClear Results

--------------------------------------------------------------------------- AttributeError                            Traceback (most recent call last) <ipython-input-42-714a9bbd2b92> in <module>()      74 FROM sales_info      75 ORDER BY Sales DESC""") ---> 76 print.example8.collect()      77       78 example8.saveAsTextFile("/home/packt/test.txt") AttributeError: 'builtin_function_or_method' object has no attribute 'example8'

Any help figuring out the error will be greatly appreciated.

Thanks

Top answer

1 of 1

As I'm not getting much help with my original question I did some research and rewrote the pyspark.sql query as follows:

#%%
import findspark
findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7')
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('ops').getOrCreate()
df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv',inferSchema=True,header=True)
df.createOrReplaceTempView('Person_Person')
myresults = spark.sql("""SELECT
PersonType
,COUNT(PersonType) AS \Person Count`FROM Person_PersonGROUP BY PersonType""")myresults.collect()result = myresults.collect()resultresult.saveAsTextFile("test")`

However, I'm now getting the following error message:

AttributeError Traceback (most recent call last) <ipython-input-9-9e137ed161cc> in <module>() ----> 1 result.saveAsTextFile("test") AttributeError: 'list' object has no attribute 'saveAsTextFile'

Hopefully, this will get more responses to help me fix this issue

Thanks

Stack Overflow

stackoverflow.com › questions › 50686616 › dataframe-object-has-no-attribute-apply-when-trying-to-apply-lambda-to-cre

python - "'DataFrame' object has no attribute 'apply'" when trying to apply lambda to create new column - Stack Overflow

Top answer

1 of 2

The syntax you are using is for a pandas DataFrame. To achieve this for a spark DataFrame, you should use the withColumn() method. This works great for a wide range of well defined DataFrame functions, but it's a little more complicated for user defined mapping functions.

General Case

In order to define a udf, you need to specify the output data type. For instance, if you wanted to apply a function my_func that returned a string, you could create a udf as follows:

import pyspark.sql.functions as f
my_udf = f.udf(my_func, StringType())

Then you can use my_udf to create a new column like:

df = df.withColumn('new_column', my_udf(f.col("some_column_name")))

Another option is to use select:

df = df.select("*", my_udf(f.col("some_column_name")).alias("new_column"))

Specific Problem

Using a udf

In your specific case, you want to use a dictionary to translate the values of your DataFrame.

Here is a way to define a udf for this purpose:

some_map_udf = f.udf(lambda x: some_map.get(x, None), IntegerType())

Notice that I used dict.get() because you want your udf to be robust to bad inputs.

df = df.withColumn('new_column', some_map_udf(f.col("some_column_name")))

Using DataFrame functions

Sometimes using a udf is unavoidable, but whenever possible, using DataFrame functions is usually preferred.

Here is one option to do the same thing without using the udf.

The trick is to iterate over the items in some_map to create a list of pyspark.sql.functions.when() functions.

some_map_func = [f.when(f.col("some_column_name") == k, v) for k, v in some_map.items()]
print(some_map_func)
#[Column<CASE WHEN (some_column_name = a) THEN 0 END>,
# Column<CASE WHEN (some_column_name = c) THEN 1 END>,
# Column<CASE WHEN (some_column_name = b) THEN 1 END>]

Now you can use pyspark.sql.functions.coalesce() inside of a select:

df = df.select("*", f.coalesce(*some_map_func).alias("some_column_name"))

This works because when() returns null by default if the condition is not met, and coalesce() will pick the first non-null value it encounters. Since the keys of the map are unique, at most one column will be non-null.

2 of 2

You have a spark dataframe, not a pandas dataframe. To add new column to the spark dataframe:

import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType
df = df.withColumn('new_column', F.udf(some_map.get, IntegerType())(some_column_name))
df.show()

GitHub

github.com › microsoft › FLAML › issues › 625

AttributeError: 'DataFrame' object has no attribute 'copy' · Issue #625 · microsoft/FLAML

July 2, 2022 - train = spark.read.parquet("./train.parquet") test = spark.read.parquet("./test.parquet") input_cols = [c for c in train.columns if c != 'target'] vectorAssembler = VectorAssembler(inputCols = input_cols, outputCol = 'features') vectorAssembler.setHandleInvalid("skip").transform(train).show train_sprk = vectorAssembler.transform(train) test_sprk = vectorAssembler.transform(test) from sklearn.model_selection import train_test_split from sklearn.datasets import make_classification y = train_sprk["target"] X = train_sprk[input_cols] X, y = make_classification() X_train, X_test, y_train, y_test =

Author Shafi2016

Databricks Community

community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute › td-p › 61132

AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 61132

February 19, 2024 - Hello, I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message: "AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'" Here is part of the code: @dl...

Stack Overflow

stackoverflow.com › questions › 51813517 › dataframe-object-has-no-attribute-col

apache spark - DataFrame object has no attribute 'col' - Stack Overflow

Top answer

1 of 5

The book you're referring to describes Scala / Java API. In PySpark use []

df["count"]

2 of 5

The book combines the Scala and PySpark API's.

In Scala / Java API, df.col("column_name") or df.apply("column_name") return the Column.

Whereas in pyspark use the below to get the column from DF.

df.colName
df["colName"]