You can't reference a second spark DataFrame inside a function, unless you're using a join. IIUC, you can do the following to achieve your desired result.

Suppose that means is the following:

#means.show()
#+---+---------+
#| id|avg(col1)|
#+---+---------+
#|  1|     12.0|
#|  3|    300.0|
#|  2|     21.0|
#+---+---------+

Join df and means on the id column, then apply your when condition

from pyspark.sql.functions import when

df.join(means, on="id")\
    .withColumn(
        "col1",
        when(
            (df["col1"].isNull()), 
            means["avg(col1)"]
        ).otherwise(df["col1"])
    )\
    .select(*df.columns)\
    .show()
#+---+-----+
#| id| col1|
#+---+-----+
#|  1| 12.0|
#|  1| 12.0|
#|  1| 14.0|
#|  1| 10.0|
#|  3|300.0|
#|  3|300.0|
#|  2| 21.0|
#|  2| 22.0|
#|  2| 20.0|
#+---+-----+

But in this case, I'd actually recommend using a Window with pyspark.sql.functions.mean:

from pyspark.sql import Window
from pyspark.sql.functions import col, mean

df.withColumn(
    "col1",
    when(
        col("col1").isNull(), 
        mean("col1").over(Window.partitionBy("id"))
    ).otherwise(col("col1"))
).show()
#+---+-----+
#| id| col1|
#+---+-----+
#|  1| 12.0|
#|  1| 10.0|
#|  1| 12.0|
#|  1| 14.0|
#|  3|300.0|
#|  3|300.0|
#|  2| 22.0|
#|  2| 20.0|
#|  2| 21.0|
#+---+-----+
Answer from pault on Stack Overflow
🌐
Stack Overflow
stackoverflow.com › questions › 72442900 › dataframe-py-in-getattr-attributeerror-dataframe-object-has-no-attribut
pyspark - dataframe.py", in __getattr__ AttributeError: 'DataFrame' object has no attribute 'index' - Stack Overflow
May 31, 2022 - ERROR as below File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1643, in getattr AttributeError: 'DataFrame' object has no attribute 'index'
🌐
Incorta Community
community.incorta.com › t5 › data-schemas-knowledgebase › issue-with-converting-a-pandas-dataframe-to-a-spark-dataframe › ta-p › 5279
Issue with converting a Pandas DataFrame to a Spar... - Incorta Community
November 15, 2023 - We first created a python dictionary ... went away. The issue can be seen in the Incorta Data Profiler data application. The workaround is to downgrade the Pandas version for now....
Discussions

object has no attribute 'index'
class ABC: def __init__(self): self.details = {'Name' : ['Ankit', 'Aishwarya', 'Shaurya', 'Shivangi'],'Age' : [23, 21, 22, 21]} self.df = pd.DataFrame(self.details) looks like it has an indent problem, you probably want class ABC: def __init__(self): self.details = {'Name' : ['Ankit', 'Aishwarya', 'Shaurya', 'Shivangi'],'Age' : [23, 21, 22, 21]} self.df = pd.DataFrame(self.details) but i don't know if that's in your code or just in your reddit post More on reddit.com
🌐 r/learnpython
2
2
October 11, 2021
python - pyspark 'DataFrame' object has no attribute '_get_object_id' - Stack Overflow
I am trying to run some code, but getting error: 'DataFrame' object has no attribute '_get_object_id' The code: items = [(1,12),(1,float('Nan')),(1,14),(1,10),(2,22),(2,20),(2,float('Nan')),(3,3... More on stackoverflow.com
🌐 stackoverflow.com
Create a pyspark dataframe index when appending to creating a db table - Stack Overflow
I've gotten really slow access for a select statement, which I suspect must be down to the db table not having an index. How can I use one of the existing columns to index upon (say an existing cus... More on stackoverflow.com
🌐 stackoverflow.com
January 4, 2024
python - How to resolve AttributeError: 'DataFrame' object has no attribute - Stack Overflow
I know that this kind of question was asked before and I've checked all the answers and I have tried several times to find a solution but in vain. In fact I call a Dataframe using Pandas. I've upl... More on stackoverflow.com
🌐 stackoverflow.com
🌐
Cloudera Community
community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › td-p › 78093
Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'
January 2, 2024 - As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method. result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame or RDD api: https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriter · https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.rdd.RDD ... To save a DataFrame as a text file in PySpark, you need to convert it to an RDD first, or use DataFrame writer functions.
🌐
Apache
spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › frame.html
DataFrame — PySpark 4.1.1 documentation - Apache Spark
DataFrame([data, index, columns, dtype, copy]) · pandas-on-Spark DataFrame that corresponds to pandas DataFrame logically
🌐
Reddit
reddit.com › r/learnpython › object has no attribute 'index'
r/learnpython on Reddit: object has no attribute 'index'
October 11, 2021 -
import pandas as pd
from pandasql import sqldf

class ABC:
    def __init__(self):
    self.details = {'Name' : ['Ankit', 'Aishwarya', 'Shaurya', 'Shivangi'],'Age' : [23, 21, 22, 21]}
    self.df = pd.DataFrame(self.details)

objAbc=ABC()  

for i,rowObject in sqldf("select distinct Name from objAbc.df").iterrows(): 
    print(rowObject.Name)

AttributeError Traceback (most recent call last) <ipython-input-21-aaba2874542d> in <module> 8 objAbc=ABC() 9 ---> 10 for i,rowObject in sqldf("select distinct Name from objAbc.df").iterrows(): 11 print(rowObject.Name) 12 ~\anaconda3\lib\site-packages\pandasql\sqldf.py in sqldf(query, env, db_uri) 154 >>> sqldf("select avg(x) from df;", locals()) 155 """ --> 156 return PandaSQL(db_uri)(query, env) ~\anaconda3\lib\site-packages\pandasql\sqldf.py in __call__(self, query, env) 56 continue 57 self.loaded_tables.add(table_name) ---> 58 write_table(env[table_name], table_name, conn) 59 60 try: ~\anaconda3\lib\site-packages\pandasql\sqldf.py in write_table(df, tablename, conn) 119 message='The provided table name \'%s\' is not found exactly as such in the database' % tablename) 120 to_sql(df, name=tablename, con=conn, --> 121 index=not any(name is None for name in df.index.names)) # load index into db if all levels are named 122 123 AttributeError: 'ABC' object has no attribute 'index'

I want to iterate through dataframe defined in class ABC using pandasql. sqldf function but Getting Error.

'ABC' object has no attribute 'index'

Not able to understand why object.property name is not working in sqldf.

PS: As per requirement it's mandatory to use sqldf of pandasql.

Please Help.

🌐
Stack Overflow
stackoverflow.com › questions › 77758030 › create-a-pyspark-dataframe-index-when-appending-to-creating-a-db-table
Create a pyspark dataframe index when appending to creating a db table - Stack Overflow
January 4, 2024 - AttributeError: 'DataFrame' object has no attribute 'set_index' How can I push down indexing with the subsequent command: df_spark.write.option('header','true').saveAsTable(name="db.table", mode="append") pyspark · Share · Improve this question · Follow · edited Jan 4, 2024 at 11:46 ·
🌐
Spark By {Examples}
sparkbyexamples.com › home › hbase › attributeerror: ‘dataframe’ object has no attribute ‘map’ in pyspark
AttributeError: 'DataFrame' object has no attribute 'map' in PySpark - Spark By {Examples}
March 27, 2024 - PySpark DataFrame doesn’t have a map() transformation instead it’s present in RDD hence you are getting the error AttributeError: ‘DataFrame’ object has no attribute ‘map’
Find elsewhere
🌐
Apache
spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › api › pyspark.pandas.DataFrame.reset_index.html
pyspark.pandas.DataFrame.reset_index — PySpark 4.1.1 documentation
Do not try to insert index into dataframe columns. This reset the index to the default integer index. ... Modify the DataFrame in place (do not create a new object).
🌐
Researchdatapod
researchdatapod.com › home › how to solve pandas attributeerror: ‘dataframe’ object has no attribute ‘str’
How to Solve Pandas AttributeError: 'DataFrame' object has no attribute 'str' - The Research Scientist Pod
May 14, 2022 - Therefore df.columns is a MultiIndex, not an Index. ... Therefore, when df['pizza'] returns a DataFrame instead of a series, and DataFrame does not have str as an attribute.
🌐
Saturn Cloud
saturncloud.io › blog › solving-the-dataframe-object-has-no-attribute-name-error-in-pandas
Solving the 'DataFrame Object Has No Attribute 'name' Error in Pandas | Saturn Cloud Blog
November 2, 2023 - A Series object represents a single column or row in a DataFrame, and its name attribute corresponds to the column or row label. ... Now that we understand the error and the 'name' attribute, let’s look at how to resolve this error. If you want to access the names of all columns in a DataFrame, you can use the columns attribute. ... This will output Index(['A', 'B'], dtype='object'), which is a list of all column names.
🌐
Databricks Community
community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute › td-p › 61132
AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 61132
February 19, 2024 - Hello, I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message: "AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'" Here is part of the code: @dl...
Top answer
1 of 5
2

"sklearn.datasets" is a scikit package, where it contains a method load_iris().

load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.

Whereas 'iris.csv', holds feature and target together.

FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.

from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
2 of 5
1

The Iris Dataset from Sklearn is in Sklearn's Bunch format:

print(type(iris))
print(iris.keys())

output:

<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

So, that's why you can access it as:

x=iris.data
y=iris.target

But when you read the CSV file as DataFrame as mentioned by you:

iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()

output is:

    2   3
0   petal_length    petal_width
1   1.4 0.2
2   1.4 0.2
3   1.3 0.2
4   1.5 0.2

Here the column names are '1' and '2'.

First of all you should read the CSV file as:

df = pd.read_csv('iris.csv')

you should not include header=None as your csv file includes the column names i.e. the headers.

So, now what you can do is something like this:

X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'

or if you want to use the column names then:

X = df[['petal_length', 'petal_width']]
y = df.iloc['species']

Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)
🌐
GeeksforGeeks
geeksforgeeks.org › how-to-fix-module-pandas-has-no-attribute-dataframe
How to Fix: module ‘pandas’ has no attribute ‘dataframe’ - GeeksforGeeks
December 19, 2021 - To create dataframe we need to use DataFrame(). If we use dataframe it will throw an error because there is no dataframe attribute in pandas. The method is DataFrame(). We need to pass any dictionary as an argument.
🌐
Stack Overflow
stackoverflow.com › questions › 75096915 › featuretools-got-an-error-attributeerror-dataframe-object-has-no-attribute
featuretools: got an error "AttributeError: 'DataFrame' object has no attribute 'ww'" - Stack Overflow
January 12, 2023 - import featuretools as ft import pyspark.pandas as ps from woodwork.logical_types import Double, Integer ps.set_option("compute.default_index_type", "distributed") id = [0, 1, 2, 3, 4] values = [12, -35, 14, 103, -51] spark_df = ps.DataFrame({"id": id, "values": values}) es = ft.EntitySet(id="spark_es") es = es.add_dataframe( dataframe_name="spark_input_df", dataframe=spark_df, index="id", logical_types={"id": Integer, "values": Double}, ) es · bug got an error "AttributeError: 'DataFrame' object has no attribute 'ww'" anyone can help me ?