You mixed up pandas dataframe and Spark dataframe.

The issue is pandas df doesn't have spark function withColumn.

Answer from Ani Menon on Stack Overflow
🌐
Itsourcecode
itsourcecode.com › home › attributeerror: ‘dataframe’ object has no attribute ‘withcolumn’
attributeerror: 'dataframe' object has no attribute 'withcolumn' |Fixed
March 31, 2023 - The error message “‘DataFrame’ object has no attribute ‘withColumn’” occurs when you are trying to add a new column to a Pandas DataFrame using the withColumn() method.
Discussions

'DataFrame' object has no attribute 'withColumn' Getting this error - Stack Overflow
I am trying to do string Matching. But when I am getting this error while creating a column. Please help. (AttributeError: 'DataFrame' object has no attribute 'withColumn') from pyspark.sql import More on stackoverflow.com
🌐 stackoverflow.com
April 30, 2023
"'DataFrame' object has no attribute" Issue

Double check if there's a space in the column name. 'Survived ' vs 'Survived' It happens more often than you'd think especially with CSV data source.

More on reddit.com
🌐 r/learnpython
8
1
October 30, 2020
AttributeError: 'DataFrame' object has no attribute 'name'; Various stack overflow / github suggested fixes not working
What happened: I perform a pipeline of transformations on a Dask dataframe originating from dd.read_sql_table() from a view in an oracle DB. In one stage that follows many successful stages, I try ... More on github.com
🌐 github.com
10
January 26, 2022
AttributeError: 'DataFrame' object has no attribute 'data'
I would recommend you start off by reading the 10min pandas Quick Guide . This should get you on the right track (at the very least you will be able to load data into pandas). Afterwards, you should also be able to tell why your current code doesn't make much sense. More on reddit.com
🌐 r/learnpython
4
0
September 29, 2021
Top answer
1 of 5
2

"sklearn.datasets" is a scikit package, where it contains a method load_iris().

load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.

Whereas 'iris.csv', holds feature and target together.

FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.

from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
2 of 5
1

The Iris Dataset from Sklearn is in Sklearn's Bunch format:

print(type(iris))
print(iris.keys())

output:

<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

So, that's why you can access it as:

x=iris.data
y=iris.target

But when you read the CSV file as DataFrame as mentioned by you:

iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()

output is:

    2   3
0   petal_length    petal_width
1   1.4 0.2
2   1.4 0.2
3   1.3 0.2
4   1.5 0.2

Here the column names are '1' and '2'.

First of all you should read the CSV file as:

df = pd.read_csv('iris.csv')

you should not include header=None as your csv file includes the column names i.e. the headers.

So, now what you can do is something like this:

X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'

or if you want to use the column names then:

X = df[['petal_length', 'petal_width']]
y = df.iloc['species']

Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)
🌐
Stack Overflow
stackoverflow.com › questions › 76139983 › dataframe-object-has-no-attribute-withcolumn-getting-this-error
'DataFrame' object has no attribute 'withColumn' Getting this error - Stack Overflow
April 30, 2023 - from pyspark.sql import functions as f from fuzzywuzzy import fuzz from pyspark.sql.types import StringType from pyspark.sql import SparkSession, DataFrame def matchstring(s1, s2): return fuzz.token_sort_ratio(s1, s2) MatchUDF = f.udf(matchstring, StringType()) spark = SparkSession.builder.appName("test").getOrCreate() df_merged = ps.merge(df_Sale_KR,df_Dist_Mast, on='Distributor_ID', how='left') df_similarity_score = df_merged.withColumn("similarity_score", MatchUDF(f.col("source"), f.col("target"))) df_similarity_score.show() ... I can't currently test this so may be a bit off, but pandas merge-function doesn't return a pyspark dataframe, it returns a pandas dataframe which - as far as I know - does not implement withColumn.
🌐
Apache
spark.apache.org › docs › latest › api › python › reference › pyspark.sql › api › pyspark.sql.DataFrame.withColumn.html
pyspark.sql.DataFrame.withColumn — PySpark 4.1.2 documentation
To avoid this, use select() with multiple columns at once. ... >>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"]) >>> df.withColumn('age2', df.age + 2).show() +---+-----+----+ |age| name|age2| +---+-----+----+ | 2|Alice| 4| | 5| Bob| 7| +---+-----+----+
🌐
Plain English
python.plainenglish.io › how-to-fix-attributeerror-in-python-6cea86059a27
How to Fix AttributeError in Python? | by JOKEN VILLANUEVA | Python in Plain English
January 27, 2025 - The error message “‘DataFrame’ object has no attribute ‘withColumn’” occurs when you are trying to add a new column to a Pandas DataFrame using the withColumn() method.
🌐
Reddit
reddit.com › r/learnpython › "'dataframe' object has no attribute" issue
r/learnpython on Reddit: "'DataFrame' object has no attribute" Issue
October 30, 2020 -

I am in university and am taking a special topics class regarding AI. I have zero knowledge about Python, how it works, or what anything means.

A project for the class involves manipulating Bayesian networks to predict how many and which individuals die upon the sinking of a ship. This is the code I am supposed to manipulate:

##EDIT VARIABLES TO THE VARIABLES OF INTEREST
train_var = train.loc[:,['Survived','Sex']]  
test_var = test.loc[:,['Sex']]  
BayesNet = BayesianModel([('Sex','Survived')])

I am supposed to add another variable, 'Pclass,' to the mix, paying attention to the order for causation. I have added that variable to every line of this code in every way imaginable and consistently get an error from this line:

predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
predictions

For example, the error I get for this version of the code:

train_var = train.loc[:,['Survived','Pclass','Sex']]  
test_var = test.loc[:,['Pclass']]  
BayesNet = BayesianModel([('Sex','Pclass','Survived')])

is this:

AttributeError                            Traceback (most recent call last)
<ipython-input-98-16d9eb9451f7> in <module>
----> 1 predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
      2 predictions

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5137             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5138                 return self[name]
-> 5139             return object.__getattribute__(self, name)
   5140 
   5141     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'Survived'

Honestly, I have no idea wtf any of this means. I have tried googling this issue and have come up with nothing.

Any help would be greatly appreciated. I know it's a lot.

🌐
GitHub
github.com › dask › dask › issues › 8624
AttributeError: 'DataFrame' object has no attribute 'name'; Various stack overflow / github suggested fixes not working · Issue #8624 · dask/dask
January 26, 2022 - { File "[redacted]/pandas/core/generic.py", line 5487, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'name'. Did you mean: 'rename'? }
Author   dask
Find elsewhere
🌐
Databricks Community
community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute-rename › td-p › 28109
Solved: AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 28109
January 2, 2024 - https://stackoverflow.com/questions/38134643/data-frame-object-has-no-attribute ... If df_boston is a DataFrame, but you still face issues, try an alternative syntax: df_boston = df_boston.rename(columns={'zn': 'Zoning'}).
🌐
Stack Overflow
stackoverflow.com › q › 46832357
apache spark - pyspark: DataFrame.withColumn() sometimes requires assignment to a new DataFrame with a different name - Stack Overflow
df_new = df.withColumn('AMOUNT', df.AMOUNT*lit(-1)) => works! When I use other methods or UDFs, it doesn't seem to exhibit the same weirdness. I can just assign the DataFrame back to itself.
🌐
Reddit
reddit.com › r/learnpython › attributeerror: 'dataframe' object has no attribute 'data'
r/learnpython on Reddit: AttributeError: 'DataFrame' object has no attribute 'data'
September 29, 2021 -
wine = pd.read_csv("combined.csv", header=0).iloc[:-1]
df = pd.DataFrame(wine)
df
dataset = pd.DataFrame(df.data, columns =df.feature_names)
dataset['target']=df.target
dataset

ERROR:

<ipython-input-27-64122078da92> in <module>
----> 1 dataset = pd.DataFrame(df.data, columns =df.feature_names)
      2 dataset['target']=df.target
      3 dataset

D:\Anaconda\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5463             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5464                 return self[name]
-> 5465             return object.__getattribute__(self, name)
   5466 
   5467     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'data'

I'm trying to set up a target to proceed with my Multi Linear Regression Project, but I can't even do that. I've already downloaded the CSV file and have it uploaded on a Jupyter Notebook. What I'm I doing wrong?

🌐
YouTube
youtube.com › watch
How to fix AttributeError: 'DataFrame' object has no attribute 'columns' whe... in Python - YouTube
Hello, Dedicated Coders! 🖥️💡We're excited to share with you our newest video, "How to solve AttributeError: 'DataFrame' object has no attribute 'columns' ...
Published   May 5, 2024
Views   290
Top answer
1 of 2
7

The syntax you are using is for a pandas DataFrame. To achieve this for a spark DataFrame, you should use the withColumn() method. This works great for a wide range of well defined DataFrame functions, but it's a little more complicated for user defined mapping functions.

General Case

In order to define a udf, you need to specify the output data type. For instance, if you wanted to apply a function my_func that returned a string, you could create a udf as follows:

import pyspark.sql.functions as f
my_udf = f.udf(my_func, StringType())

Then you can use my_udf to create a new column like:

df = df.withColumn('new_column', my_udf(f.col("some_column_name")))

Another option is to use select:

df = df.select("*", my_udf(f.col("some_column_name")).alias("new_column"))

Specific Problem

Using a udf

In your specific case, you want to use a dictionary to translate the values of your DataFrame.

Here is a way to define a udf for this purpose:

some_map_udf = f.udf(lambda x: some_map.get(x, None), IntegerType())

Notice that I used dict.get() because you want your udf to be robust to bad inputs.

df = df.withColumn('new_column', some_map_udf(f.col("some_column_name")))

Using DataFrame functions

Sometimes using a udf is unavoidable, but whenever possible, using DataFrame functions is usually preferred.

Here is one option to do the same thing without using the udf.

The trick is to iterate over the items in some_map to create a list of pyspark.sql.functions.when() functions.

some_map_func = [f.when(f.col("some_column_name") == k, v) for k, v in some_map.items()]
print(some_map_func)
#[Column<CASE WHEN (some_column_name = a) THEN 0 END>,
# Column<CASE WHEN (some_column_name = c) THEN 1 END>,
# Column<CASE WHEN (some_column_name = b) THEN 1 END>]

Now you can use pyspark.sql.functions.coalesce() inside of a select:

df = df.select("*", f.coalesce(*some_map_func).alias("some_column_name"))

This works because when() returns null by default if the condition is not met, and coalesce() will pick the first non-null value it encounters. Since the keys of the map are unique, at most one column will be non-null.

2 of 2
1

You have a spark dataframe, not a pandas dataframe. To add new column to the spark dataframe:

import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType
df = df.withColumn('new_column', F.udf(some_map.get, IntegerType())(some_column_name))
df.show()
🌐
Polars
docs.pola.rs › py-polars › html › reference › dataframe › api › polars.DataFrame.with_columns.html
polars.DataFrame.with_columns — Polars documentation
Add columns to this DataFrame. Added columns will replace existing columns with the same name. ... Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.
🌐
Saturn Cloud
saturncloud.io › blog › solving-the-dataframe-object-has-no-attribute-name-error-in-pandas
Solving the 'DataFrame Object Has No Attribute 'name' Error in Pandas | Saturn Cloud Blog
July 10, 2023 - Running this code will result in an AttributeError: 'DataFrame' object has no attribute 'name'. This is because a DataFrame as a whole does not have a 'name' attribute.