attributeerror: 'dataframe' object has no attribute '_get_object_id

pyspark 'DataFrame' object has no attribute '_get_object_id'

stackoverflow.com › questions › 57363618 › pyspark-dataframe-object-has-no-attribute-get-object-id

You can't reference a second spark DataFrame inside a function, unless you're using a join. IIUC, you can do the following to achieve your desired result.

Suppose that means is the following:

#means.show()
#+---+---------+
#| id|avg(col1)|
#+---+---------+
#|  1|     12.0|
#|  3|    300.0|
#|  2|     21.0|
#+---+---------+

Join df and means on the id column, then apply your when condition

from pyspark.sql.functions import when

df.join(means, on="id")\
    .withColumn(
        "col1",
        when(
            (df["col1"].isNull()), 
            means["avg(col1)"]
        ).otherwise(df["col1"])
    )\
    .select(*df.columns)\
    .show()
#+---+-----+
#| id| col1|
#+---+-----+
#|  1| 12.0|
#|  1| 12.0|
#|  1| 14.0|
#|  1| 10.0|
#|  3|300.0|
#|  3|300.0|
#|  2| 21.0|
#|  2| 22.0|
#|  2| 20.0|
#+---+-----+

But in this case, I'd actually recommend using a Window with pyspark.sql.functions.mean:

from pyspark.sql import Window
from pyspark.sql.functions import col, mean

df.withColumn(
    "col1",
    when(
        col("col1").isNull(), 
        mean("col1").over(Window.partitionBy("id"))
    ).otherwise(col("col1"))
).show()
#+---+-----+
#| id| col1|
#+---+-----+
#|  1| 12.0|
#|  1| 10.0|
#|  1| 12.0|
#|  1| 14.0|
#|  3|300.0|
#|  3|300.0|
#|  2| 22.0|
#|  2| 20.0|
#|  2| 21.0|
#+---+-----+

Answer from pault on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 57363618 › pyspark-dataframe-object-has-no-attribute-get-object-id

python - pyspark 'DataFrame' object has no attribute '_get_object_id' - Stack Overflow

Top answer

1 of 2

You can't reference a second spark DataFrame inside a function, unless you're using a join. IIUC, you can do the following to achieve your desired result.

Suppose that means is the following:

#means.show()
#+---+---------+
#| id|avg(col1)|
#+---+---------+
#|  1|     12.0|
#|  3|    300.0|
#|  2|     21.0|
#+---+---------+

Join df and means on the id column, then apply your when condition

from pyspark.sql.functions import when

df.join(means, on="id")\
    .withColumn(
        "col1",
        when(
            (df["col1"].isNull()), 
            means["avg(col1)"]
        ).otherwise(df["col1"])
    )\
    .select(*df.columns)\
    .show()
#+---+-----+
#| id| col1|
#+---+-----+
#|  1| 12.0|
#|  1| 12.0|
#|  1| 14.0|
#|  1| 10.0|
#|  3|300.0|
#|  3|300.0|
#|  2| 21.0|
#|  2| 22.0|
#|  2| 20.0|
#+---+-----+

But in this case, I'd actually recommend using a Window with pyspark.sql.functions.mean:

from pyspark.sql import Window
from pyspark.sql.functions import col, mean

df.withColumn(
    "col1",
    when(
        col("col1").isNull(), 
        mean("col1").over(Window.partitionBy("id"))
    ).otherwise(col("col1"))
).show()
#+---+-----+
#| id| col1|
#+---+-----+
#|  1| 12.0|
#|  1| 10.0|
#|  1| 12.0|
#|  1| 14.0|
#|  3|300.0|
#|  3|300.0|
#|  2| 22.0|
#|  2| 20.0|
#|  2| 21.0|
#+---+-----+

2 of 2

-5

I think you are using Scala API, in which you use (). In PySpark, use [] instead.

AWS re:Post

repost.aws › questions › QUvWrsRjenSrqHLJqLpy4DWg › attributeerror-dataframe-object-has-no-attribute-get-object-id

AttributeError: 'DataFrame' object has no attribute '_get_object_id' | AWS re:Post

October 11, 2018 - Using the Zeppilin notebook server, I have written the following script. The initialization is taken from the template created in glue, but the rest of it is custom. I'm getting the error: ``` Att...

Discussions

"'DataFrame' object has no attribute" Issue

Double check if there's a space in the column name. 'Survived ' vs 'Survived' It happens more often than you'd think especially with CSV data source.

More on reddit.com

r/learnpython

October 30, 2020

AttributeError: 'DataFrame' object has no attribute 'name'; Various stack overflow / github suggested fixes not working

What happened: I perform a pipeline of transformations on a Dask dataframe originating from dd.read_sql_table() from a view in an oracle DB. In one stage that follows many successful stages, I try ... More on github.com

github.com

January 26, 2022

AttributeError: 'numpy.int64' object has no attribute '_get_object_id'

~/.virtualenvs/iv/lib/python3.... += ";" + interface 297 else: --> 298 command_part = REFERENCE_TYPE + parameter._get_object_id() 299 300 command_part += "\n" AttributeError: 'numpy.int64' object has no attribute '_get_object_id'... More on github.com

github.com

May 19, 2021

python - I got the following error : 'DataFrame' object has no attribute 'data' - Data Science Stack Exchange

I am trying to get the 'data' and the 'target' of the iris setosa database, but I can't. For example, when I load the iris setosa directly from sklearn datasets I get a good result: Program: from More on datascience.stackexchange.com

datascience.stackexchange.com

August 26, 2018

Cumulative Sum

cumsum.wordpress.com › 2020 › 10 › 10 › pyspark-attributeerror-dataframe-object-has-no-attribute-_get_object_id

[pyspark] AttributeError: ‘DataFrame’ object has no attribute ‘_get_object_id’

October 10, 2020 - df[df.id.isin(df2.select('id'))].show() ... ‘_get_object_id’ · The reason being that isin expects actual local values or collections but df2.select('id') returns a data frame....

Databricks

kb.databricks.com › python › function-object-no-attribute

AttributeError: ‘function’ object has no attribute - Databricks

May 19, 2022 - If a column in your DataFrame uses a protected keyword as the column name, you will get an error message. For example, summary is a protected keyword. If you use summary as a column name, you will see the error message. This sample code uses summary as a column name and generates the error message when run. %python df=spark.createDataFrame([1,2], "int").toDF("id") df.show() from pyspark.sql.types import StructType,StructField, StringType, IntegerType df1 = spark.createDataFrame( [(10,), (11,), (13,)], StructType([StructField("summary", IntegerType(), True)])) df1.show() ResultDf = df1.join(df, df1.summary == df.id, "inner").select(df.id,df1.summary) ResultDf.show()

reddit.com › r/learnpython › "'dataframe' object has no attribute" issue

r/learnpython on Reddit: "'DataFrame' object has no attribute" Issue

October 30, 2020 -

I am in university and am taking a special topics class regarding AI. I have zero knowledge about Python, how it works, or what anything means.

A project for the class involves manipulating Bayesian networks to predict how many and which individuals die upon the sinking of a ship. This is the code I am supposed to manipulate:

##EDIT VARIABLES TO THE VARIABLES OF INTEREST
train_var = train.loc[:,['Survived','Sex']]  
test_var = test.loc[:,['Sex']]  
BayesNet = BayesianModel([('Sex','Survived')])

I am supposed to add another variable, 'Pclass,' to the mix, paying attention to the order for causation. I have added that variable to every line of this code in every way imaginable and consistently get an error from this line:

predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
predictions

For example, the error I get for this version of the code:

train_var = train.loc[:,['Survived','Pclass','Sex']]  
test_var = test.loc[:,['Pclass']]  
BayesNet = BayesianModel([('Sex','Pclass','Survived')])

is this:

AttributeError                            Traceback (most recent call last)
<ipython-input-98-16d9eb9451f7> in <module>
----> 1 predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
      2 predictions

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5137             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5138                 return self[name]
-> 5139             return object.__getattribute__(self, name)
   5140 
   5141     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'Survived'

Honestly, I have no idea wtf any of this means. I have tried googling this issue and have come up with nothing.

Any help would be greatly appreciated. I know it's a lot.

Top answer

1 of 2

Double check if there's a space in the column name. 'Survived ' vs 'Survived' It happens more often than you'd think especially with CSV data source.

2 of 2

It's an issue with how you're calling the data and if it's actually there.

train.loc[:,['Survived','Sex']]

tells me that there's a DataFrame (which is from pandas, hence the error) called train and this line is trying to access parts of that dataframe (it's just a type of an array). Specifically, it's trying to access columns named Survived and Sex.

Similarly, this line tells me there's another dataframe (df) known as test with a column named Sex and this is access that data.

test.loc[:,['Sex']]

The error code also informs me of some things

predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})

There's another df called predictions that's of dict type which is trying to access information from the another hypothesis df. The attribute it's tryin to access in the second key of the dict is

hypothesis.Survived.tolist()

which is a way of calling a column from that df. That is, when the predictions line is executed, it's trying to pull all the values from the Survived column of the hypothesis df.

The error is that the df doesn't actually have a column named Survived. So either there's missing data, or you're calling it wrong, or there's a missing reference.

Without knowing more about your code and your question, I can't really extrapolate much more.

Itsourcecode

itsourcecode.com › home › ‘dataframe’ object has no attribute ‘_get_object_id’ [fixed]

'dataframe' object has no attribute '_get_object_id' [FIXED]

March 28, 2023 - attributeerror: 'dataframe' object has no attribute '_get_object_id' in Python can be easily solved by upgrading your Pandas or using the reset_index() method instead of _get_object_id() method.

Cloudera Community

community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › m-p › 78093

Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

January 2, 2024 - #%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ops').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv',inferSchema=True,header=True) df.createOrReplaceTempView('Person_Person') myresults = spark.sql("""SELECT PersonType ,COUNT(PersonType) AS `Person Count` FROM Person_Person GROUP BY PersonType""") myresults.collect() result = myresults.collect() result result.saveAsTextFile("test") However, I'm now getting the following error message: AttributeError: 'list' object has no attribute 'saveAsTextFile'

Databricks Community

community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute › td-p › 61132

AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 61132

February 19, 2024 - Hello, I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message: "AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'" Here is part of the code: @dl...

Find elsewhere

Google Bing Mojeek

GitHub

github.com › dask › dask › issues › 8624

AttributeError: 'DataFrame' object has no attribute 'name'; Various stack overflow / github suggested fixes not working · Issue #8624 · dask/dask

January 26, 2022 - Then in the main script where I had done all the upstream transformations to the dataframe which we are working on, instantiate this class and call its get_df_of_ids method: { from myimportutils.my_import_utils import SomeUtils tools = SomeUtils(homes_to_import_unnormalized_dd_df) df_of_locale_ids = tools.get_df_of_ids() # This line fails: } ... { File "[ ... redacted]/pandas/core/generic.py", line 5487, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'name'.

Author david-thrower

GitHub

github.com › databricks › koalas › issues › 2161

AttributeError: 'numpy.int64' object has no attribute '_get_object_id' · Issue #2161 · databricks/koalas

May 19, 2021 - a = ks.DataFrame({'source': [1,2,3,4,5]}) a.source.isin([np.int64(1), np.int64(2)]) AttributeError: 'numpy.int64' object has no attribute '_get_object_id' But this is ok a = ks.DataFrame({'source': [1,2,3,4,5]}) a.source.isin([1, 2]) Ful...

Published May 19, 2021

Author lepmik

Stack Exchange

datascience.stackexchange.com › questions › 37435 › i-got-the-following-error-dataframe-object-has-no-attribute-data

python - I got the following error : 'DataFrame' object has no attribute 'data' - Data Science Stack Exchange

Top answer

1 of 5

"sklearn.datasets" is a scikit package, where it contains a method load_iris().

load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.

Whereas 'iris.csv', holds feature and target together.

FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.

from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)

2 of 5

The Iris Dataset from Sklearn is in Sklearn's Bunch format:

print(type(iris))
print(iris.keys())

output:

<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

So, that's why you can access it as:

x=iris.data
y=iris.target

But when you read the CSV file as DataFrame as mentioned by you:

iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()

output is:

    2   3
0   petal_length    petal_width
1   1.4 0.2
2   1.4 0.2
3   1.3 0.2
4   1.5 0.2

Here the column names are '1' and '2'.

First of all you should read the CSV file as:

df = pd.read_csv('iris.csv')

you should not include header=None as your csv file includes the column names i.e. the headers.

So, now what you can do is something like this:

X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'

or if you want to use the column names then:

X = df[['petal_length', 'petal_width']]
y = df.iloc['species']

Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)

GitHub

github.com › mwaskom › seaborn › issues › 3379

AttributeError: 'DataFrame' object has no attribute 'get' · Issue #3379 · mwaskom/seaborn

June 6, 2023 - 3185 p = _CategoricalPlotter() 3186 p.require_numeric = plotter_class.require_numeric -> 3187 p.establish_variables(x_, y_, hue, data, orient, order, hue_order) 3188 if ( 3189 order is not None 3190 or (sharex and p.orient == "v") 3191 or (sharey and p.orient == "h") 3192 ): 3193 # Sync categorical axis between facets to have the same categories 3194 order = p.group_names ... --> 532 x = data.get(x, x) 533 y = data.get(y, y) 534 hue = data.get(hue, hue) AttributeError: 'DataFrame' object has no attribute 'get'

Author nick-youngblut

reddit.com › r/learnpython › attributeerror: 'dataframe' object has no attribute 'data'

r/learnpython on Reddit: AttributeError: 'DataFrame' object has no attribute 'data'

September 29, 2021 -

wine = pd.read_csv("combined.csv", header=0).iloc[:-1]
df = pd.DataFrame(wine)
df
dataset = pd.DataFrame(df.data, columns =df.feature_names)
dataset['target']=df.target
dataset

ERROR:

<ipython-input-27-64122078da92> in <module>
----> 1 dataset = pd.DataFrame(df.data, columns =df.feature_names)
      2 dataset['target']=df.target
      3 dataset

D:\Anaconda\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5463             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5464                 return self[name]
-> 5465             return object.__getattribute__(self, name)
   5466 
   5467     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'data'

I'm trying to set up a target to proceed with my Multi Linear Regression Project, but I can't even do that. I've already downloaded the CSV file and have it uploaded on a Jupyter Notebook. What I'm I doing wrong?

Top answer

1 of 3

I would recommend you start off by reading the 10min pandas Quick Guide . This should get you on the right track (at the very least you will be able to load data into pandas). Afterwards, you should also be able to tell why your current code doesn't make much sense.

2 of 3

Is there a column in your file called data? you already have a df from the first line. Maybe you just want to rename the columns. df.rename(columns={})

Apache

lists.apache.org › thread › 4fkyq539n629kdy7mdcjww2jrfh3snd5

Re: pyspark not working for me...

Email display mode: · Modern rendering · Legacy rendering · This site requires JavaScript enabled. Please enable it

Saturn Cloud

saturncloud.io › blog › solving-the-dataframe-object-has-no-attribute-name-error-in-pandas

Solving the 'DataFrame Object Has No Attribute 'name' Error in Pandas | Saturn Cloud Blog

November 2, 2023 - Pandas is a powerful data manipulation library in Python, widely used by data scientists and analysts. However, it's not uncommon to encounter errors while working with it. One such error is the 'DataFrame object has no attribute 'name'' error. This blog post will guide you through understanding ...

Mbse-capella

forum.mbse-capella.org › scripting

Object has no attribute '_get_object_id' - Scripting - Eclipse Capella Forum

May 23, 2022 - requirements Packages (pkg): def get_reqPKG3(self): “”" “”" return create_e_list(self.get_java_object().getOwnedRequirementPkgs(), RequirementPkg) I do obtain a java list containing the requirements packages successfully: print(se.get_physical_architecture().get_reqPKG3()) → The...

GitHub

github.com › YosefLab › Compass › issues › 92

AttributeError: 'DataFrame' object has no attribute 'iteritems' · Issue #92 · YosefLab/Compass

April 5, 2023 - "Evaluating Reaction Pentalties:" Traceback (most recent call last): File "/home/ubuntu/.local/bin/compass", line 8, in sys.exit(entry()) File "/home/ubuntu/.local/lib/python3.8/site-packages/compass/main.py", line 588, in entry penalties = eval_reaction_penalties(args['data'], args['model'], File "/home/ubuntu/.local/lib/python3.8/site-packages/compass/compass/penalties.py", line 96, in eval_reaction_penalties reaction_penalties = eval_reaction_penalties_shared( File "/home/ubuntu/.local/lib/python3.8/site-packages/compass/compass/penalties.py", line 158, in eval_reaction_penalties_shared for name, expression_data in expression.iteritems(): File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 5989, in getattr return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'iteritems'

Author JoelHaas

Stack Overflow

stackoverflow.com › questions › 38134643 › how-to-resolve-attributeerror-dataframe-object-has-no-attribute

python - How to resolve AttributeError: 'DataFrame' object has no attribute - Stack Overflow

Top answer

1 of 7

Check your DataFrame with data.columns

It should print something like this

Index([u'regiment', u'company',  u'name',u'postTestScore'], dtype='object')

Check for hidden white spaces..Then you can rename with

data = data.rename(columns={'Number ': 'Number'})

2 of 7

I think the column name that contains "Number" is something like " Number" or "Number ". I'm assuming you might have a residual space in the column name. Please run print "<{}>".format(data.columns[1]) and see what you get. If it's something like < Number>, it can be fixed with:

data.columns = data.columns.str.strip()

See pandas.Series.str.strip

In general, AttributeError: 'DataFrame' object has no attribute '...', where ... is some column name, is caused because . notation has been used to reference a nonexistent column name or pandas method.

pandas methods are accessed with a .. pandas columns can also be accessed with a . (e.g. data.col) or with brackets (e.g. ['col'] or [['col1', 'col2']]).

data.columns = data.columns.str.strip() is a fast way to quickly remove leading and trailing spaces from all column names. Otherwise verify the column or attribute is correctly spelled.

YouTube

youtube.com › watch

How to fix AttributeError: 'DataFrame' object has no attribute 'columns' whe... in Python - YouTube

00:59

Hello, Dedicated Coders! 🖥️💡We're excited to share with you our newest video, "How to solve AttributeError: 'DataFrame' object has no attribute 'columns' ...

Published May 5, 2024

Views 290