dataframe' object has no attribute 'index' pyspark

pyspark 'DataFrame' object has no attribute '_get_object_id'

stackoverflow.com › questions › 57363618 › pyspark-dataframe-object-has-no-attribute-get-object-id

You can't reference a second spark DataFrame inside a function, unless you're using a join. IIUC, you can do the following to achieve your desired result.

Suppose that means is the following:

#means.show()
#+---+---------+
#| id|avg(col1)|
#+---+---------+
#|  1|     12.0|
#|  3|    300.0|
#|  2|     21.0|
#+---+---------+

Join df and means on the id column, then apply your when condition

from pyspark.sql.functions import when

df.join(means, on="id")\
    .withColumn(
        "col1",
        when(
            (df["col1"].isNull()), 
            means["avg(col1)"]
        ).otherwise(df["col1"])
    )\
    .select(*df.columns)\
    .show()
#+---+-----+
#| id| col1|
#+---+-----+
#|  1| 12.0|
#|  1| 12.0|
#|  1| 14.0|
#|  1| 10.0|
#|  3|300.0|
#|  3|300.0|
#|  2| 21.0|
#|  2| 22.0|
#|  2| 20.0|
#+---+-----+

But in this case, I'd actually recommend using a Window with pyspark.sql.functions.mean:

from pyspark.sql import Window
from pyspark.sql.functions import col, mean

df.withColumn(
    "col1",
    when(
        col("col1").isNull(), 
        mean("col1").over(Window.partitionBy("id"))
    ).otherwise(col("col1"))
).show()
#+---+-----+
#| id| col1|
#+---+-----+
#|  1| 12.0|
#|  1| 10.0|
#|  1| 12.0|
#|  1| 14.0|
#|  3|300.0|
#|  3|300.0|
#|  2| 22.0|
#|  2| 20.0|
#|  2| 21.0|
#+---+-----+

Answer from pault on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 72442900 › dataframe-py-in-getattr-attributeerror-dataframe-object-has-no-attribut

pyspark - dataframe.py", in __getattr__ AttributeError: 'DataFrame' object has no attribute 'index' - Stack Overflow

May 31, 2022 - ERROR as below File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1643, in getattr AttributeError: 'DataFrame' object has no attribute 'index'

Incorta Community

community.incorta.com › t5 › data-schemas-knowledgebase › issue-with-converting-a-pandas-dataframe-to-a-spark-dataframe › ta-p › 5279

Issue with converting a Pandas DataFrame to a Spar... - Incorta Community

November 15, 2023 - We first created a python dictionary ... went away. The issue can be seen in the Incorta Data Profiler data application. The workaround is to downgrade the Pandas version for now....

Discussions

object has no attribute 'index'

class ABC: def __init__(self): self.details = {'Name' : ['Ankit', 'Aishwarya', 'Shaurya', 'Shivangi'],'Age' : [23, 21, 22, 21]} self.df = pd.DataFrame(self.details) looks like it has an indent problem, you probably want class ABC: def __init__(self): self.details = {'Name' : ['Ankit', 'Aishwarya', 'Shaurya', 'Shivangi'],'Age' : [23, 21, 22, 21]} self.df = pd.DataFrame(self.details) but i don't know if that's in your code or just in your reddit post More on reddit.com

r/learnpython

October 11, 2021

python - pyspark 'DataFrame' object has no attribute '_get_object_id' - Stack Overflow

I am trying to run some code, but getting error: 'DataFrame' object has no attribute '_get_object_id' The code: items = [(1,12),(1,float('Nan')),(1,14),(1,10),(2,22),(2,20),(2,float('Nan')),(3,3... More on stackoverflow.com

stackoverflow.com

Create a pyspark dataframe index when appending to creating a db table - Stack Overflow

I've gotten really slow access for a select statement, which I suspect must be down to the db table not having an index. How can I use one of the existing columns to index upon (say an existing cus... More on stackoverflow.com

stackoverflow.com

January 4, 2024

python - How to resolve AttributeError: 'DataFrame' object has no attribute - Stack Overflow

I know that this kind of question was asked before and I've checked all the answers and I have tried several times to find a solution but in vain. In fact I call a Dataframe using Pandas. I've upl... More on stackoverflow.com

stackoverflow.com

Cloudera Community

community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › td-p › 78093

Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

January 2, 2024 - As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method. result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame or RDD api: https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriter · https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.rdd.RDD ... To save a DataFrame as a text file in PySpark, you need to convert it to an RDD first, or use DataFrame writer functions.

Apache

spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › frame.html

DataFrame — PySpark 4.1.1 documentation - Apache Spark

DataFrame([data, index, columns, dtype, copy]) · pandas-on-Spark DataFrame that corresponds to pandas DataFrame logically

reddit.com › r/learnpython › object has no attribute 'index'

r/learnpython on Reddit: object has no attribute 'index'

October 11, 2021 -

import pandas as pd
from pandasql import sqldf

class ABC:
    def __init__(self):
    self.details = {'Name' : ['Ankit', 'Aishwarya', 'Shaurya', 'Shivangi'],'Age' : [23, 21, 22, 21]}
    self.df = pd.DataFrame(self.details)

objAbc=ABC()  

for i,rowObject in sqldf("select distinct Name from objAbc.df").iterrows(): 
    print(rowObject.Name)

AttributeError Traceback (most recent call last) <ipython-input-21-aaba2874542d> in <module> 8 objAbc=ABC() 9 ---> 10 for i,rowObject in sqldf("select distinct Name from objAbc.df").iterrows(): 11 print(rowObject.Name) 12 ~\anaconda3\lib\site-packages\pandasql\sqldf.py in sqldf(query, env, db_uri) 154 >>> sqldf("select avg(x) from df;", locals()) 155 """ --> 156 return PandaSQL(db_uri)(query, env) ~\anaconda3\lib\site-packages\pandasql\sqldf.py in __call__(self, query, env) 56 continue 57 self.loaded_tables.add(table_name) ---> 58 write_table(env[table_name], table_name, conn) 59 60 try: ~\anaconda3\lib\site-packages\pandasql\sqldf.py in write_table(df, tablename, conn) 119 message='The provided table name \'%s\' is not found exactly as such in the database' % tablename) 120 to_sql(df, name=tablename, con=conn, --> 121 index=not any(name is None for name in df.index.names)) # load index into db if all levels are named 122 123 AttributeError: 'ABC' object has no attribute 'index'

I want to iterate through dataframe defined in class ABC using pandasql. sqldf function but Getting Error.

'ABC' object has no attribute 'index'

Not able to understand why object.property name is not working in sqldf.

PS: As per requirement it's mandatory to use sqldf of pandasql.

Please Help.

Top answer

1 of 1

Stack Overflow

stackoverflow.com › questions › 57363618 › pyspark-dataframe-object-has-no-attribute-get-object-id

python - pyspark 'DataFrame' object has no attribute '_get_object_id' - Stack Overflow

Top answer

1 of 2

You can't reference a second spark DataFrame inside a function, unless you're using a join. IIUC, you can do the following to achieve your desired result.

Suppose that means is the following:

#means.show()
#+---+---------+
#| id|avg(col1)|
#+---+---------+
#|  1|     12.0|
#|  3|    300.0|
#|  2|     21.0|
#+---+---------+

Join df and means on the id column, then apply your when condition

from pyspark.sql.functions import when

df.join(means, on="id")\
    .withColumn(
        "col1",
        when(
            (df["col1"].isNull()), 
            means["avg(col1)"]
        ).otherwise(df["col1"])
    )\
    .select(*df.columns)\
    .show()
#+---+-----+
#| id| col1|
#+---+-----+
#|  1| 12.0|
#|  1| 12.0|
#|  1| 14.0|
#|  1| 10.0|
#|  3|300.0|
#|  3|300.0|
#|  2| 21.0|
#|  2| 22.0|
#|  2| 20.0|
#+---+-----+

But in this case, I'd actually recommend using a Window with pyspark.sql.functions.mean:

from pyspark.sql import Window
from pyspark.sql.functions import col, mean

df.withColumn(
    "col1",
    when(
        col("col1").isNull(), 
        mean("col1").over(Window.partitionBy("id"))
    ).otherwise(col("col1"))
).show()
#+---+-----+
#| id| col1|
#+---+-----+
#|  1| 12.0|
#|  1| 10.0|
#|  1| 12.0|
#|  1| 14.0|
#|  3|300.0|
#|  3|300.0|
#|  2| 22.0|
#|  2| 20.0|
#|  2| 21.0|
#+---+-----+

2 of 2

-5

I think you are using Scala API, in which you use (). In PySpark, use [] instead.

Stack Overflow

stackoverflow.com › questions › 77758030 › create-a-pyspark-dataframe-index-when-appending-to-creating-a-db-table

Create a pyspark dataframe index when appending to creating a db table - Stack Overflow

January 4, 2024 - AttributeError: 'DataFrame' object has no attribute 'set_index' How can I push down indexing with the subsequent command: df_spark.write.option('header','true').saveAsTable(name="db.table", mode="append") pyspark · Share · Improve this question · Follow · edited Jan 4, 2024 at 11:46 ·

Spark By {Examples}

sparkbyexamples.com › home › hbase › attributeerror: ‘dataframe’ object has no attribute ‘map’ in pyspark

AttributeError: 'DataFrame' object has no attribute 'map' in PySpark - Spark By {Examples}

March 27, 2024 - PySpark DataFrame doesn’t have a map() transformation instead it’s present in RDD hence you are getting the error AttributeError: ‘DataFrame’ object has no attribute ‘map’

Find elsewhere

Google Bing Mojeek

Stack Overflow

stackoverflow.com › questions › 38134643 › how-to-resolve-attributeerror-dataframe-object-has-no-attribute

python - How to resolve AttributeError: 'DataFrame' object has no attribute - Stack Overflow

Top answer

1 of 7

Check your DataFrame with data.columns

It should print something like this

Index([u'regiment', u'company',  u'name',u'postTestScore'], dtype='object')

Check for hidden white spaces..Then you can rename with

data = data.rename(columns={'Number ': 'Number'})

2 of 7

I think the column name that contains "Number" is something like " Number" or "Number ". I'm assuming you might have a residual space in the column name. Please run print "<{}>".format(data.columns[1]) and see what you get. If it's something like < Number>, it can be fixed with:

data.columns = data.columns.str.strip()

See pandas.Series.str.strip

In general, AttributeError: 'DataFrame' object has no attribute '...', where ... is some column name, is caused because . notation has been used to reference a nonexistent column name or pandas method.

pandas methods are accessed with a .. pandas columns can also be accessed with a . (e.g. data.col) or with brackets (e.g. ['col'] or [['col1', 'col2']]).

data.columns = data.columns.str.strip() is a fast way to quickly remove leading and trailing spaces from all column names. Otherwise verify the column or attribute is correctly spelled.

Apache

spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › api › pyspark.pandas.DataFrame.reset_index.html

pyspark.pandas.DataFrame.reset_index — PySpark 4.1.1 documentation

Do not try to insert index into dataframe columns. This reset the index to the default integer index. ... Modify the DataFrame in place (do not create a new object).

Researchdatapod

researchdatapod.com › home › how to solve pandas attributeerror: ‘dataframe’ object has no attribute ‘str’

How to Solve Pandas AttributeError: 'DataFrame' object has no attribute 'str' - The Research Scientist Pod

May 14, 2022 - Therefore df.columns is a MultiIndex, not an Index. ... Therefore, when df['pizza'] returns a DataFrame instead of a series, and DataFrame does not have str as an attribute.

Saturn Cloud

saturncloud.io › blog › solving-the-dataframe-object-has-no-attribute-name-error-in-pandas

Solving the 'DataFrame Object Has No Attribute 'name' Error in Pandas | Saturn Cloud Blog

November 2, 2023 - A Series object represents a single column or row in a DataFrame, and its name attribute corresponds to the column or row label. ... Now that we understand the error and the 'name' attribute, let’s look at how to resolve this error. If you want to access the names of all columns in a DataFrame, you can use the columns attribute. ... This will output Index(['A', 'B'], dtype='object'), which is a list of all column names.

Databricks Community

community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute › td-p › 61132

AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 61132

February 19, 2024 - Hello, I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message: "AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'" Here is part of the code: @dl...

Stack Overflow

stackoverflow.com › questions › 51813517 › dataframe-object-has-no-attribute-col

apache spark - DataFrame object has no attribute 'col' - Stack Overflow

Top answer

1 of 5

The book you're referring to describes Scala / Java API. In PySpark use []

df["count"]

2 of 5

The book combines the Scala and PySpark API's.

In Scala / Java API, df.col("column_name") or df.apply("column_name") return the Column.

Whereas in pyspark use the below to get the column from DF.

df.colName
df["colName"]

Stack Overflow

stackoverflow.com › q › 38622723

pyspark AttributeError: 'DataFrame' object has no attribute 'toDF' - Stack Overflow

Top answer

1 of 2

I figured it out. Looks like it has to do with our spark version. It worked with 1.6

2 of 2

if you are working with spark version 1.6 then use this code for conversion of rdd into df

from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
df = sqlContext.createDataFrame(rdd)

if you want to assign title to rows then use this

df= rdd.map(lambda p: Row(ip=p[0], time=p[1], zone=p[2]))

ip,time,zone are row headers in this example.

Apache

spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › api › pyspark.pandas.DataFrame.set_index.html

pyspark.pandas.DataFrame.set_index — PySpark 4.1.1 documentation

Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length).

Stack Exchange

datascience.stackexchange.com › questions › 37435 › i-got-the-following-error-dataframe-object-has-no-attribute-data

python - I got the following error : 'DataFrame' object has no attribute 'data' - Data Science Stack Exchange

Top answer

1 of 5

"sklearn.datasets" is a scikit package, where it contains a method load_iris().

load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.

Whereas 'iris.csv', holds feature and target together.

FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.

from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)

2 of 5

The Iris Dataset from Sklearn is in Sklearn's Bunch format:

print(type(iris))
print(iris.keys())

output:

<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

So, that's why you can access it as:

x=iris.data
y=iris.target

But when you read the CSV file as DataFrame as mentioned by you:

iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()

output is:

    2   3
0   petal_length    petal_width
1   1.4 0.2
2   1.4 0.2
3   1.3 0.2
4   1.5 0.2

Here the column names are '1' and '2'.

First of all you should read the CSV file as:

df = pd.read_csv('iris.csv')

you should not include header=None as your csv file includes the column names i.e. the headers.

So, now what you can do is something like this:

X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'

or if you want to use the column names then:

X = df[['petal_length', 'petal_width']]
y = df.iloc['species']

Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)

GeeksforGeeks

geeksforgeeks.org › how-to-fix-module-pandas-has-no-attribute-dataframe

How to Fix: module ‘pandas’ has no attribute ‘dataframe’ - GeeksforGeeks

December 19, 2021 - To create dataframe we need to use DataFrame(). If we use dataframe it will throw an error because there is no dataframe attribute in pandas. The method is DataFrame(). We need to pass any dictionary as an argument.

Stack Overflow

stackoverflow.com › questions › 16363233 › pandas-error-dataframe-object-has-no-attribute-loc

python - Pandas error: 'DataFrame' object has no attribute 'loc' - Stack Overflow

Top answer

1 of 3

I came across this question when I was dealing with pyspark DataFrame. So, if you're also using pyspark DataFrame, you can convert it to pandas DataFrame using toPandas() method.

2 of 3

loc was introduced in 0.11, so you'll need to upgrade your pandas to follow the 10minute introduction.

Stack Overflow

stackoverflow.com › questions › 75096915 › featuretools-got-an-error-attributeerror-dataframe-object-has-no-attribute

featuretools: got an error "AttributeError: 'DataFrame' object has no attribute 'ww'" - Stack Overflow

January 12, 2023 - import featuretools as ft import pyspark.pandas as ps from woodwork.logical_types import Double, Integer ps.set_option("compute.default_index_type", "distributed") id = [0, 1, 2, 3, 4] values = [12, -35, 14, 103, -51] spark_df = ps.DataFrame({"id": id, "values": values}) es = ft.EntitySet(id="spark_es") es = es.add_dataframe( dataframe_name="spark_input_df", dataframe=spark_df, index="id", logical_types={"id": Integer, "values": Double}, ) es · bug got an error "AttributeError: 'DataFrame' object has no attribute 'ww'" anyone can help me ?