You mixed up pandas dataframe and Spark dataframe.

The issue is pandas df doesn't have spark function withColumn.

Answer from Ani Menon on Stack Overflow
🌐
Itsourcecode
itsourcecode.com › home › attributeerror: ‘dataframe’ object has no attribute ‘withcolumn’
attributeerror: 'dataframe' object has no attribute 'withcolumn' |Fixed
March 31, 2023 - The error message “‘DataFrame’ object has no attribute ‘withColumn’” occurs when you are trying to add a new column to a Pandas DataFrame using the withColumn() method.
Discussions

python - I got the following error : 'DataFrame' object has no attribute 'data' - Data Science Stack Exchange
I am trying to get the 'data' and the 'target' of the iris setosa database, but I can't. For example, when I load the iris setosa directly from sklearn datasets I get a good result: Program: from More on datascience.stackexchange.com
🌐 datascience.stackexchange.com
August 26, 2018
AttributeError: 'DataFrame' object has no attribute 'name'; Various stack overflow / github suggested fixes not working
What happened: I perform a pipeline of transformations on a Dask dataframe originating from dd.read_sql_table() from a view in an oracle DB. In one stage that follows many successful stages, I try ... More on github.com
🌐 github.com
10
January 26, 2022
python - How to resolve AttributeError: 'DataFrame' object has no attribute - Stack Overflow
I know that this kind of question was asked before and I've checked all the answers and I have tried several times to find a solution but in vain. In fact I call a Dataframe using Pandas. I've upl... More on stackoverflow.com
🌐 stackoverflow.com
"'DataFrame' object has no attribute" Issue

Double check if there's a space in the column name. 'Survived ' vs 'Survived' It happens more often than you'd think especially with CSV data source.

More on reddit.com
🌐 r/learnpython
8
1
October 30, 2020
🌐
Databricks Community
community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute-rename › td-p › 28109
Solved: AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 28109
January 2, 2024 - Hello, I am doing the Data Science and Machine Learning course. The Boston housing has unintuitive column names. I want to rename them, e.g. so 'zn' becomes 'Zoning'. When I run this command: df_bostonLegible = df_boston.rename({'zn':'Zoning'}, axis='columns') Then I get the error "AttributeError: '...
🌐
Plain English
python.plainenglish.io › how-to-fix-attributeerror-in-python-6cea86059a27
How to Fix AttributeError in Python? | by JOKEN VILLANUEVA | Python in Plain English
January 27, 2025 - The error message “‘DataFrame’ object has no attribute ‘withColumn’” occurs when you are trying to add a new column to a Pandas DataFrame using the withColumn() method.
🌐
Apache
spark.apache.org › docs › latest › api › python › reference › pyspark.sql › api › pyspark.sql.DataFrame.withColumn.html
pyspark.sql.DataFrame.withColumn — PySpark 4.1.1 documentation
To avoid this, use select() with multiple columns at once. ... >>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"]) >>> df.withColumn('age2', df.age + 2).show() +---+-----+----+ |age| name|age2| +---+-----+----+ | 2|Alice| 4| | 5| Bob| 7| +---+-----+----+
Top answer
1 of 5
2

"sklearn.datasets" is a scikit package, where it contains a method load_iris().

load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.

Whereas 'iris.csv', holds feature and target together.

FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.

from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
2 of 5
1

The Iris Dataset from Sklearn is in Sklearn's Bunch format:

print(type(iris))
print(iris.keys())

output:

<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

So, that's why you can access it as:

x=iris.data
y=iris.target

But when you read the CSV file as DataFrame as mentioned by you:

iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()

output is:

    2   3
0   petal_length    petal_width
1   1.4 0.2
2   1.4 0.2
3   1.3 0.2
4   1.5 0.2

Here the column names are '1' and '2'.

First of all you should read the CSV file as:

df = pd.read_csv('iris.csv')

you should not include header=None as your csv file includes the column names i.e. the headers.

So, now what you can do is something like this:

X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'

or if you want to use the column names then:

X = df[['petal_length', 'petal_width']]
y = df.iloc['species']

Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)
🌐
GitHub
github.com › dask › dask › issues › 8624
AttributeError: 'DataFrame' object has no attribute 'name'; Various stack overflow / github suggested fixes not working · Issue #8624 · dask/dask
January 26, 2022 - The method being applied should return a Dask dataframe of 3 np.int64 columns: state_id, city_id, district_id. inspect.getmembers(my_dask_df) was expected to return a list of object names and values to introspect the dask dataframe object.
Author   david-thrower
Find elsewhere
🌐
Cloudera Community
community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › m-p › 78093
Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'
January 2, 2024 - #%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ops').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv',inferSchema=True,header=True) df.createOrReplaceTempView('Person_Person') myresults = spark.sql("""SELECT PersonType ,COUNT(PersonType) AS `Person Count` FROM Person_Person GROUP BY PersonType""") myresults.collect() result = myresults.collect() result result.saveAsTextFile("test") However, I'm now getting the following error message: AttributeError: 'list' object has no attribute 'saveAsTextFile'
🌐
Reddit
reddit.com › r/learnpython › "'dataframe' object has no attribute" issue
r/learnpython on Reddit: "'DataFrame' object has no attribute" Issue
October 30, 2020 -

I am in university and am taking a special topics class regarding AI. I have zero knowledge about Python, how it works, or what anything means.

A project for the class involves manipulating Bayesian networks to predict how many and which individuals die upon the sinking of a ship. This is the code I am supposed to manipulate:

##EDIT VARIABLES TO THE VARIABLES OF INTEREST
train_var = train.loc[:,['Survived','Sex']]  
test_var = test.loc[:,['Sex']]  
BayesNet = BayesianModel([('Sex','Survived')])

I am supposed to add another variable, 'Pclass,' to the mix, paying attention to the order for causation. I have added that variable to every line of this code in every way imaginable and consistently get an error from this line:

predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
predictions

For example, the error I get for this version of the code:

train_var = train.loc[:,['Survived','Pclass','Sex']]  
test_var = test.loc[:,['Pclass']]  
BayesNet = BayesianModel([('Sex','Pclass','Survived')])

is this:

AttributeError                            Traceback (most recent call last)
<ipython-input-98-16d9eb9451f7> in <module>
----> 1 predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
      2 predictions

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5137             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5138                 return self[name]
-> 5139             return object.__getattribute__(self, name)
   5140 
   5141     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'Survived'

Honestly, I have no idea wtf any of this means. I have tried googling this issue and have come up with nothing.

Any help would be greatly appreciated. I know it's a lot.

🌐
Reddit
reddit.com › r/learnpython › attributeerror: 'dataframe' object has no attribute 'data'
r/learnpython on Reddit: AttributeError: 'DataFrame' object has no attribute 'data'
September 29, 2021 -
wine = pd.read_csv("combined.csv", header=0).iloc[:-1]
df = pd.DataFrame(wine)
df
dataset = pd.DataFrame(df.data, columns =df.feature_names)
dataset['target']=df.target
dataset

ERROR:

<ipython-input-27-64122078da92> in <module>
----> 1 dataset = pd.DataFrame(df.data, columns =df.feature_names)
      2 dataset['target']=df.target
      3 dataset

D:\Anaconda\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5463             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5464                 return self[name]
-> 5465             return object.__getattribute__(self, name)
   5466 
   5467     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'data'

I'm trying to set up a target to proceed with my Multi Linear Regression Project, but I can't even do that. I've already downloaded the CSV file and have it uploaded on a Jupyter Notebook. What I'm I doing wrong?

🌐
Databricks Community
community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute › td-p › 61132
AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 61132
February 19, 2024 - Hello, I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message: "AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'" Here is part of the code: @dl...
🌐
Brainly
brainly.com › computers and technology › high school › how to resolve attributeerror: 'dataframe' object has no attribute?
[FREE] How to resolve AttributeError: 'DataFrame' object has no attribute? - brainly.com
When you encounter an AttributeError with a DataFrame object, it typically means that you are trying to access a column or method that does not exist on the DataFrame.
🌐
Stack Overflow
stackoverflow.com › questions › 76139983 › dataframe-object-has-no-attribute-withcolumn-getting-this-error
'DataFrame' object has no attribute 'withColumn' Getting this error - Stack Overflow
April 30, 2023 - I can't currently test this so may be a bit off, but pandas merge-function doesn't return a pyspark dataframe, it returns a pandas dataframe which - as far as I know - does not implement withColumn.
Top answer
1 of 2
7

The syntax you are using is for a pandas DataFrame. To achieve this for a spark DataFrame, you should use the withColumn() method. This works great for a wide range of well defined DataFrame functions, but it's a little more complicated for user defined mapping functions.

General Case

In order to define a udf, you need to specify the output data type. For instance, if you wanted to apply a function my_func that returned a string, you could create a udf as follows:

import pyspark.sql.functions as f
my_udf = f.udf(my_func, StringType())

Then you can use my_udf to create a new column like:

df = df.withColumn('new_column', my_udf(f.col("some_column_name")))

Another option is to use select:

df = df.select("*", my_udf(f.col("some_column_name")).alias("new_column"))

Specific Problem

Using a udf

In your specific case, you want to use a dictionary to translate the values of your DataFrame.

Here is a way to define a udf for this purpose:

some_map_udf = f.udf(lambda x: some_map.get(x, None), IntegerType())

Notice that I used dict.get() because you want your udf to be robust to bad inputs.

df = df.withColumn('new_column', some_map_udf(f.col("some_column_name")))

Using DataFrame functions

Sometimes using a udf is unavoidable, but whenever possible, using DataFrame functions is usually preferred.

Here is one option to do the same thing without using the udf.

The trick is to iterate over the items in some_map to create a list of pyspark.sql.functions.when() functions.

some_map_func = [f.when(f.col("some_column_name") == k, v) for k, v in some_map.items()]
print(some_map_func)
#[Column<CASE WHEN (some_column_name = a) THEN 0 END>,
# Column<CASE WHEN (some_column_name = c) THEN 1 END>,
# Column<CASE WHEN (some_column_name = b) THEN 1 END>]

Now you can use pyspark.sql.functions.coalesce() inside of a select:

df = df.select("*", f.coalesce(*some_map_func).alias("some_column_name"))

This works because when() returns null by default if the condition is not met, and coalesce() will pick the first non-null value it encounters. Since the keys of the map are unique, at most one column will be non-null.

2 of 2
1

You have a spark dataframe, not a pandas dataframe. To add new column to the spark dataframe:

import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType
df = df.withColumn('new_column', F.udf(some_map.get, IntegerType())(some_column_name))
df.show()
🌐
Saturn Cloud
saturncloud.io › blog › solving-the-dataframe-object-has-no-attribute-name-error-in-pandas
Solving the 'DataFrame Object Has No Attribute 'name' Error in Pandas | Saturn Cloud Blog
November 2, 2023 - import pandas as pd # Create a ... 'DataFrame' object has no attribute 'name'. This is because a DataFrame as a whole does not have a 'name' attribute....
🌐
Kaggle
kaggle.com › general › 108926
AttributeError: 'DataFrame' object has no attribute 'dtype' ...
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds