dataframe' object has no attribute 'cast

pyspark AttributeError: 'DataFrame' object has no attribute 'cast'

stackoverflow.com › questions › 71446185 › pyspark-attributeerror-dataframe-object-has-no-attribute-cast

A short, clean, scalable solution
Change some columns, leave the rest untouched

import pyspark.sql.functions as F

# That's not part of the solution, just a creation of a sample dataframe
# df = spark.createDataFrame([(10, 1,2,3,4),(20, 5,6,7,8)],'Id int, Revenue int ,GROSS_PROFIT int ,Net_Income int ,Enterprise_Value int')

cols_to_cast = ["Revenue" ,"GROSS_PROFIT" ,"Net_Income" ,"Enterprise_Value"]
df = df.select([F.col(c).cast('double') if c in cols_to_cast else c for c in df.columns])

df.printSchema()

root
 |-- Id: integer (nullable = true)
 |-- Revenue: double (nullable = true)
 |-- GROSS_PROFIT: double (nullable = true)
 |-- Net_Income: double (nullable = true)
 |-- Enterprise_Value: double (nullable = true)

Answer from David דודו Markovitz on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 71446185 › pyspark-attributeerror-dataframe-object-has-no-attribute-cast

apache spark sql - pyspark AttributeError: 'DataFrame' object has no attribute 'cast' - Stack Overflow

Top answer

1 of 3

A short, clean, scalable solution
Change some columns, leave the rest untouched

import pyspark.sql.functions as F

# That's not part of the solution, just a creation of a sample dataframe
# df = spark.createDataFrame([(10, 1,2,3,4),(20, 5,6,7,8)],'Id int, Revenue int ,GROSS_PROFIT int ,Net_Income int ,Enterprise_Value int')

cols_to_cast = ["Revenue" ,"GROSS_PROFIT" ,"Net_Income" ,"Enterprise_Value"]
df = df.select([F.col(c).cast('double') if c in cols_to_cast else c for c in df.columns])

df.printSchema()

root
 |-- Id: integer (nullable = true)
 |-- Revenue: double (nullable = true)
 |-- GROSS_PROFIT: double (nullable = true)
 |-- Net_Income: double (nullable = true)
 |-- Enterprise_Value: double (nullable = true)

2 of 3

If this helps

df = spark.createDataFrame([(1, 0),
(2, 1),
(3  ,1),
(4, 1),
(5, 0),
(6  ,0),
(7, 1),
(8  ,1),
(9  ,1),
(10,    1),
(11,    0),
(12,    0)],
('Time' ,'Tag1'))

df = df.withColumn('a', col('Time').cast('integer')).withColumn('a1', col('Tag1').cast('double'))
df.printSchema()
df.show()

reddit.com › r/learnpython › "'dataframe' object has no attribute" issue

r/learnpython on Reddit: "'DataFrame' object has no attribute" Issue

October 30, 2020 -

I am in university and am taking a special topics class regarding AI. I have zero knowledge about Python, how it works, or what anything means.

A project for the class involves manipulating Bayesian networks to predict how many and which individuals die upon the sinking of a ship. This is the code I am supposed to manipulate:

##EDIT VARIABLES TO THE VARIABLES OF INTEREST
train_var = train.loc[:,['Survived','Sex']]  
test_var = test.loc[:,['Sex']]  
BayesNet = BayesianModel([('Sex','Survived')])

I am supposed to add another variable, 'Pclass,' to the mix, paying attention to the order for causation. I have added that variable to every line of this code in every way imaginable and consistently get an error from this line:

predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
predictions

For example, the error I get for this version of the code:

train_var = train.loc[:,['Survived','Pclass','Sex']]  
test_var = test.loc[:,['Pclass']]  
BayesNet = BayesianModel([('Sex','Pclass','Survived')])

is this:

AttributeError                            Traceback (most recent call last)
<ipython-input-98-16d9eb9451f7> in <module>
----> 1 predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
      2 predictions

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5137             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5138                 return self[name]
-> 5139             return object.__getattribute__(self, name)
   5140 
   5141     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'Survived'

Honestly, I have no idea wtf any of this means. I have tried googling this issue and have come up with nothing.

Any help would be greatly appreciated. I know it's a lot.

Top answer

1 of 2

Double check if there's a space in the column name. 'Survived ' vs 'Survived' It happens more often than you'd think especially with CSV data source.

2 of 2

It's an issue with how you're calling the data and if it's actually there.

train.loc[:,['Survived','Sex']]

tells me that there's a DataFrame (which is from pandas, hence the error) called train and this line is trying to access parts of that dataframe (it's just a type of an array). Specifically, it's trying to access columns named Survived and Sex.

Similarly, this line tells me there's another dataframe (df) known as test with a column named Sex and this is access that data.

test.loc[:,['Sex']]

The error code also informs me of some things

predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})

There's another df called predictions that's of dict type which is trying to access information from the another hypothesis df. The attribute it's tryin to access in the second key of the dict is

hypothesis.Survived.tolist()

which is a way of calling a column from that df. That is, when the predictions line is executed, it's trying to pull all the values from the Survived column of the hypothesis df.

The error is that the df doesn't actually have a column named Survived. So either there's missing data, or you're calling it wrong, or there's a missing reference.

Without knowing more about your code and your question, I can't really extrapolate much more.

Discussions

AttributeError: 'DataFrame' object has no attribute 'dtype'

This is necessary when loading ... return object.__getattribute__(self, name) 5275 5276 def __setattr__(self, name: str, value) -> None: AttributeError: 'DataFrame' object has no attribute 'dtype'... More on github.com

github.com

June 3, 2020

AttributeError: 'DataFrame' object has no attribute 'get'

3185 p = _CategoricalPlotter() ... is not None 3190 or (sharex and p.orient == "v") 3191 or (sharey and p.orient == "h") 3192 ): 3193 # Sync categorical axis between facets to have the same categories 3194 order = p.group_names ... --> 532 x = data.get(x, x) 533 y = data.get(y, y) 534 hue = data.get(hue, hue) AttributeError: 'DataFrame' object has no attribute ... More on github.com

github.com

June 6, 2023

AttributeError: 'DataFrame' object has no attribute '_get_object_id'

Using the Zeppilin notebook server, I have written the following script. The initialization is taken from the template created in glue, but the rest of it is custom. I'm getting the error: AttributeError: 'DataFrame' object has no attribute '_get_object_id' More on repost.aws

repost.aws

October 11, 2018

[FEA] as_type() for Dataframes

I am trying to cast values in a dataframe to another datatype, and get the error AttributeError: 'DataFrame' object has no attribute 'astype'. I think as_type() is available for Ser... More on github.com

github.com

August 15, 2019

Stack Exchange

datascience.stackexchange.com › questions › 37435 › i-got-the-following-error-dataframe-object-has-no-attribute-data

python - I got the following error : 'DataFrame' object has no attribute 'data' - Data Science Stack Exchange

Top answer

1 of 5

"sklearn.datasets" is a scikit package, where it contains a method load_iris().

load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.

Whereas 'iris.csv', holds feature and target together.

FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.

from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)

2 of 5

The Iris Dataset from Sklearn is in Sklearn's Bunch format:

print(type(iris))
print(iris.keys())

output:

<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

So, that's why you can access it as:

x=iris.data
y=iris.target

But when you read the CSV file as DataFrame as mentioned by you:

iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()

output is:

    2   3
0   petal_length    petal_width
1   1.4 0.2
2   1.4 0.2
3   1.3 0.2
4   1.5 0.2

Here the column names are '1' and '2'.

First of all you should read the CSV file as:

df = pd.read_csv('iris.csv')

you should not include header=None as your csv file includes the column names i.e. the headers.

So, now what you can do is something like this:

X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'

or if you want to use the column names then:

X = df[['petal_length', 'petal_width']]
y = df.iloc['species']

Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)

Edureka Community

edureka.co › community › 42320 › python-pandas-attributeerror-dataframe-object-attribute

Python Pandas error AttributeError DataFrame object has ...

March 28, 2019 - Host '172.31.27.232' is blocked because of many connection errors; unblock with 'mariadb-admin flush-hosts'

Kaggle

kaggle.com › general › 108926

AttributeError: 'DataFrame' object has no attribute 'dtype' ...

Click here if you are not automatically redirected after 5 seconds.

GitHub

github.com › pycaret › pycaret › issues › 195

AttributeError: 'DataFrame' object has no attribute 'dtype' · Issue #195 · pycaret/pycaret

June 3, 2020 - This is necessary when loading ... return object.__getattribute__(self, name) 5275 5276 def __setattr__(self, name: str, value) -> None: AttributeError: 'DataFrame' object has no attribute 'dtype'...

Author sorenwacker

Easy Tweaks

easytweaks.com › fix-attributeerror-dataframe-object-pandas

Fix attributeerror 'dataframe' object has no attribute errors ...

January 9, 2023 - Verifying that you are not a robot

GitHub

github.com › mwaskom › seaborn › issues › 3379

AttributeError: 'DataFrame' object has no attribute 'get' · Issue #3379 · mwaskom/seaborn

June 6, 2023 - 3185 p = _CategoricalPlotter() 3186 p.require_numeric = plotter_class.require_numeric -> 3187 p.establish_variables(x_, y_, hue, data, orient, order, hue_order) 3188 if ( 3189 order is not None 3190 or (sharex and p.orient == "v") 3191 or (sharey and p.orient == "h") 3192 ): 3193 # Sync categorical axis between facets to have the same categories 3194 order = p.group_names ... --> 532 x = data.get(x, x) 533 y = data.get(y, y) 534 hue = data.get(hue, hue) AttributeError: 'DataFrame' object has no attribute 'get' One must convert the polars dataframe to pandas, which requires extra computational overhead: p = sns.catplot( data=iris.to_pandas(), x="species", y="sepal_width", kind="box", height=3, aspect=1.3 ) No one assigned ·

Author nick-youngblut

Find elsewhere

Google Bing Mojeek

AWS re:Post

repost.aws › questions › QUvWrsRjenSrqHLJqLpy4DWg › attributeerror-dataframe-object-has-no-attribute-get-object-id

AttributeError: 'DataFrame' object has no attribute '_get_object_id' | AWS re:Post

October 11, 2018 - Using the Zeppilin notebook server, I have written the following script. The initialization is taken from the template created in glue, but the rest of it is custom. I'm getting the error: AttributeError: 'DataFrame' object has no attribute '_get_object_id'

Cloudera Community

community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › m-p › 78093

Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

January 2, 2024 - As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method. result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame or RDD api: https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriter

GitHub

github.com › rapidsai › cudf › issues › 2600

[FEA] as_type() for Dataframes · Issue #2600 · rapidsai/cudf

August 15, 2019 - I am trying to cast values in a dataframe to another datatype, and get the error AttributeError: 'DataFrame' object has no attribute 'astype'. I think as_type() is available for Series, but it would be good to have it for DataFrames (lik...

Published Aug 15, 2019

Stack Overflow

stackoverflow.com › questions › 47460650 › attributeerror-dataframe-object-has-no-attribute-timestamp

python - AttributeError: 'DataFrame' object has no attribute 'timestamp' - Stack Overflow

Top answer

1 of 2

The time stamp column doesn't exist yet when you try to refer to it; You can either use pyspark.sql.functions.col to refer to it in a dynamic way without specifying which data frame object the column belongs to as:

import pyspark.sql.functions as F

df = df.withColumn("unix_timestamp", df.unix_timestamp.cast("timestamp")).filter(F.col("unix_timestamp") > hours_36)

Or without creating the intermediate column:

df.filter(df.unix_timestamp.cast("timestamp") > hours_36)

2 of 2

The API Doc tells me that you can also use a String notation for filtering: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.filter

    import pyspark.sql.functions as F

df = df.withColumn("unix_timestamp", df.unix_timestamp.cast("timestamp"))
    .filter("unix_timestamp > %s" % hours_36)

Maybe its not so effienc though

Spark By {Examples}

sparkbyexamples.com › home › hbase › attributeerror: ‘dataframe’ object has no attribute ‘map’ in pyspark

AttributeError: 'DataFrame' object has no attribute 'map' in PySpark - Spark By {Examples}

March 27, 2024 - PySpark DataFrame doesn’t have a map() transformation instead it’s present in RDD hence you are getting the error AttributeError: ‘DataFrame’ object has no attribute ‘map’

Data Science Learner

datasciencelearner.com › attributeerror-dataframe-object-has-no-attribute-tolist-solved

AttributeError: dataframe object has no attribute tolist ( Solved )

October 2, 2023 - In this tutorial, you have learned how to use the tolist() function to solve the error AttributeError : dataframe object has no attribute tolist error. Instead of using the tolist() function on the entire dataframe use it on a specific column.

reddit.com › r/learnpython › attributeerror: 'dataframe' object has no attribute 'date'

r/learnpython on Reddit: AttributeError: 'DataFrame' object has no attribute 'date'

October 13, 2018 -

The last line in the code below gives an AttributeError (see desc in title). The code below is immediately followed by a loop. Any ideas how O can properly refer to the 'Date' column which is the first column in the csv file. PS.: I tried d0=p('Date') and that does give me anything either. help, please.

import numpy as np
import pandas as pd
import scipy as sp
import statsmodels.api as sm
from datetime import date

ticker = 'IBM'
begdate= date(2012,1,1)
enddate= date(2016,12,31)


p = pd.read_csv('/Users/myname/Downloads/IBM_M.csv',
                     index_col=0,
                     parse_dates=["Date"])
        

print(p.head())
#calculate log returns

p['log_ret'] = np.log(p['Adj Close']) - np.log(p['Adj Close'].shift(1))
logret = p['log_ret']  

print(logret.head())

ddate=[]
d0=p.date

Top answer

1 of 3

The correct way to reference a column on a Pandas DataFrame is using square brackets. p['Date'] is probably what you want (note that the name inside the brackets is case-sensitive).

2 of 3

You could try changing line 26 to d0 = p["date"] or "Date", as in read_csv you're using it with a capital letter.

GitHub

github.com › scikit-learn-contrib › imbalanced-learn › issues › 666

AttributeError: 'DataFrame' object has no attribute 'name' · Issue #666 · scikit-learn-contrib/imbalanced-learn

December 18, 2019 - AttributeError Traceback (most ... -> 5067 return object.__getattribute__(self, name) 5068 5069 def __setattr__(self, name, value): AttributeError: 'DataFrame' object has no attribute 'name'...

Author islrnd

freeCodeCamp

forum.freecodecamp.org › python

Test_race_count ERROR AttributeError: 'DataFrame' object has no attribute 'tolist' - Python - The freeCodeCamp Forum

Top answer

1 of 1

Answer is kind of in the comment just above the relevant part of code, race_count should be a pandas series, not pandas dataframe.

Saturn Cloud

saturncloud.io › blog › solving-the-dataframe-object-has-no-attribute-name-error-in-pandas

Solving the 'DataFrame Object Has No Attribute 'name' Error in Pandas | Saturn Cloud Blog

November 2, 2023 - Pandas is a powerful data manipulation library in Python, widely used by data scientists and analysts. However, it's not uncommon to encounter errors while working with it. One such error is the 'DataFrame object has no attribute 'name'' error. This blog post will guide you through understanding ...

Dask Forum

dask.discourse.group › dask dataframe

AttributeError: 'DataFrame' object has no attribute 'repartition' - Dask DataFrame - Dask Forum

January 12, 2022 - Hey I am a bit new to dask so apologies if its a very basic question. I have been trying parallelize my workflow which goes along the lines of read in a big dataset → filter it → convert a few columns to tensors. While trying to use dask dataframes to filter, I found there was no way to use .iloc to filter for the rows.

GitHub

github.com › dask › dask › issues › 8624

AttributeError: 'DataFrame' object has no attribute 'name'; Various stack overflow / github suggested fixes not working · Issue #8624 · dask/dask

January 26, 2022 - { File "[redacted]/pandas/core/generic.py", line 5487, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'name'. Did you mean: 'rename'? }

Author david-thrower