attributeerror: 'dataframe' object has no attribute plot pyspark

I got the following error : 'DataFrame' object has no attribute 'data'

datascience.stackexchange.com › questions › 37435 › i-got-the-following-error-dataframe-object-has-no-attribute-data

"sklearn.datasets" is a scikit package, where it contains a method load_iris().

load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.

Whereas 'iris.csv', holds feature and target together.

FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.

from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)

Answer from vipin bansal on Stack Exchange

Stack Overflow

stackoverflow.com › questions › 61570636 › how-can-i-fix-data-frame-has-no-attribute-plot

python - How can I fix data frame has no attribute plot - Stack Overflow

May 3, 2020 - dd=df.select(df.Color,df.ListPrice.cast("float")) colordf = dd[['Color','ListPrice']] colordfgroup = colordf.groupby('Color').mean('ListPrice') colordfgroup.show() my_plot = colordfgroup.plot(kind...

Cloudera Community

community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › td-p › 78093

Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

January 2, 2024 - #%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ops').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv',inferSchema=True,header=True) df.createOrReplaceTempView('Person_Person') myresults = spark.sql("""SELECT PersonType ,COUNT(PersonType) AS `Person Count` FROM Person_Person GROUP BY PersonType""") myresults.collect() result = myresults.collect() result result.saveAsTextFile("test") However, I'm now getting the following error message: AttributeError: 'list' object has no attribute 'saveAsTextFile'

Discussions

pyspark - DataFrame' object has no attribute 'get - Stack Overflow

I would like to plot y data frame but I had an error like below. Any suggestions? g1= sns.scatterplot( x= "events", y= "ways" , data = x2) Output message DataFrame' object has... More on stackoverflow.com

stackoverflow.com

pandas - PySpark : AttributeError: 'DataFrame' object has no attribute 'values' - Stack Overflow

22 ---> 23 api_param_df = ... api_param_df], axis=1) 25 /usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py in __getattr__(self, name) 1642 if name not in self.columns: 1643 raise AttributeError( -> 1644 "'%s' object has no attribute '%s'" % (self.__class__.__name__, ... More on stackoverflow.com

stackoverflow.com

python 3.x - Pandas Dataframe has no Plot function - Stack Overflow

Communities for your favorite technologies. Explore all Collectives · Ask questions, find answers and collaborate at work with Stack Overflow for Teams More on stackoverflow.com

stackoverflow.com

python - Data frame object has no attribute "plt" - Stack Overflow

I am trying to do some machine learning work in python but i cant plot anything using matplotlib. When i try to run the program, i get the error as shown in the screenshot. error Here is the code ... More on stackoverflow.com

stackoverflow.com

Stack Exchange

datascience.stackexchange.com › questions › 37435 › i-got-the-following-error-dataframe-object-has-no-attribute-data

python - I got the following error : 'DataFrame' object has no attribute 'data' - Data Science Stack Exchange

Top answer

1 of 5

"sklearn.datasets" is a scikit package, where it contains a method load_iris().

load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.

Whereas 'iris.csv', holds feature and target together.

FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.

from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)

2 of 5

The Iris Dataset from Sklearn is in Sklearn's Bunch format:

print(type(iris))
print(iris.keys())

output:

<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

So, that's why you can access it as:

x=iris.data
y=iris.target

But when you read the CSV file as DataFrame as mentioned by you:

iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()

output is:

    2   3
0   petal_length    petal_width
1   1.4 0.2
2   1.4 0.2
3   1.3 0.2
4   1.5 0.2

Here the column names are '1' and '2'.

First of all you should read the CSV file as:

df = pd.read_csv('iris.csv')

you should not include header=None as your csv file includes the column names i.e. the headers.

So, now what you can do is something like this:

X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'

or if you want to use the column names then:

X = df[['petal_length', 'petal_width']]
y = df.iloc['species']

Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)

Oceanhackweek

oceanhackweek.org › ohw22 › tutorials › 02-Wed › 01-data-visualization-in-python › tutorial › 04_Basic_Plotting.html

Read in the data — OceanHackWeek

If you are familiar with the pandas.plot API, you might expect to execute df.plot.scatter(x='longitude', y='latitude'). Feel free to try this out in a new cell, but it will throw an error: AttributeError: 'DataFrame' object has no attribute 'plot'. In order to make the data more manageable for now, we’ll briefly use just a fraction (1%) of it and call that small_df.

Stack Overflow

stackoverflow.com › questions › 67267858 › dataframe-object-has-no-attribute-get

pyspark - DataFrame' object has no attribute 'get - Stack Overflow

Top answer

1 of 1

Seaborn expects pandas dataframes and not spark dataframes. Change your code to:

g1= sns.scatterplot( x= "events", y= "ways" , data = x2.toPandas())

and it should work.

Apache

spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › frame.html

DataFrame — PySpark 4.1.1 documentation - Apache Spark

DataFrame.plot is both a callable method and a namespace attribute for specific plotting methods of the form DataFrame.plot.<kind>.

Stack Overflow

stackoverflow.com › questions › 68550053 › pyspark-attributeerror-dataframe-object-has-no-attribute-values

pandas - PySpark : AttributeError: 'DataFrame' object has no attribute 'values' - Stack Overflow

Top answer

1 of 1

The syntax is valid with Pandas DataFrames but that attribute doesn't exist for the PySpark created DataFrames. You can check out this link for the documentation.

Usually, the collect() method or the .rdd attribute would help you with these tasks.

You can use the following snippet to produce the desired result:

http_path = sdf.rdd.map(lambda row: row['http_path'].split('?'))
api_param_df = pd.DataFrame([[row[0], np.nan] if len(row) == 1 else row for row in http_path.collect()], columns=["api", "param"])
sdf = pd.concat([sdf.toPandas()['raw'], api_param_df], axis=1)

Note that I removed the comments to make it more readable and I've also substituted the regex with a simple split.

Stack Overflow

stackoverflow.com › questions › 37275790 › pandas-dataframe-has-no-plot-function

python 3.x - Pandas Dataframe has no Plot function - Stack Overflow

Top answer

1 of 1

I think your pandas version is older as 0.17.0.

See DataFrame.plot.scatter:

New in version 0.17.0.

In older version you can use:

df.plot(kind='scatter')

Find elsewhere

Google Bing Mojeek

Python Forum

python-forum.io › thread-33991.html

AttributeError: 'DataFrame' object has no attribute 'Articles'

Purposes I want to plot feathers importance for data prediction and training and testing Running Time Error: AttributeError: 'DataFrame' object has no attribute 'Articles' Error:Traceback (most recent call last): File 'D:/Clustering/text-cluster...

Databricks Community

community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute › td-p › 61132

AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 61132

February 19, 2024 - Hello, I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message: "AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'" Here is part of the code: @dl...

Stack Overflow

stackoverflow.com › questions › 53312797 › data-frame-object-has-no-attribute-plt

python - Data frame object has no attribute "plt" - Stack Overflow

Top answer

1 of 1

I think you're looking for dataset.plot not plt.

Spark By {Examples}

sparkbyexamples.com › home › hbase › attributeerror: ‘dataframe’ object has no attribute ‘map’ in pyspark

AttributeError: 'DataFrame' object has no attribute 'map' in PySpark - Spark By {Examples}

March 27, 2024 - PySpark DataFrame doesn’t have a map() transformation instead it’s present in RDD hence you are getting the error AttributeError: ‘DataFrame’ object has no attribute ‘map’

Cumulative Sum

cumsum.wordpress.com › 2020 › 10 › 10 › pyspark-attributeerror-dataframe-object-has-no-attribute-_get_object_id

[pyspark] AttributeError: ‘DataFrame’ object has no attribute ‘_get_object_id’

October 10, 2020 - AttributeError: ‘DataFrame’ object has no attribute ‘_get_object_id’ · The reason being that isin expects actual local values or collections but df2.select('id') returns a data frame.

Stack Overflow

stackoverflow.com › questions › 54347570 › attribute-error-pyspark-dataframe-error › 54347883

apache spark sql - Attribute error? Pyspark dataframe() error - Stack Overflow

Top answer

1 of 1

It means that you have an object of type Dataframe, and you are trying to call the attribute dataframe which does not exist.

Stack Overflow

stackoverflow.com › questions › 50686616 › dataframe-object-has-no-attribute-apply-when-trying-to-apply-lambda-to-cre

python - "'DataFrame' object has no attribute 'apply'" when trying to apply lambda to create new column - Stack Overflow

Top answer

1 of 2

The syntax you are using is for a pandas DataFrame. To achieve this for a spark DataFrame, you should use the withColumn() method. This works great for a wide range of well defined DataFrame functions, but it's a little more complicated for user defined mapping functions.

General Case

In order to define a udf, you need to specify the output data type. For instance, if you wanted to apply a function my_func that returned a string, you could create a udf as follows:

import pyspark.sql.functions as f
my_udf = f.udf(my_func, StringType())

Then you can use my_udf to create a new column like:

df = df.withColumn('new_column', my_udf(f.col("some_column_name")))

Another option is to use select:

df = df.select("*", my_udf(f.col("some_column_name")).alias("new_column"))

Specific Problem

Using a udf

In your specific case, you want to use a dictionary to translate the values of your DataFrame.

Here is a way to define a udf for this purpose:

some_map_udf = f.udf(lambda x: some_map.get(x, None), IntegerType())

Notice that I used dict.get() because you want your udf to be robust to bad inputs.

df = df.withColumn('new_column', some_map_udf(f.col("some_column_name")))

Using DataFrame functions

Sometimes using a udf is unavoidable, but whenever possible, using DataFrame functions is usually preferred.

Here is one option to do the same thing without using the udf.

The trick is to iterate over the items in some_map to create a list of pyspark.sql.functions.when() functions.

some_map_func = [f.when(f.col("some_column_name") == k, v) for k, v in some_map.items()]
print(some_map_func)
#[Column<CASE WHEN (some_column_name = a) THEN 0 END>,
# Column<CASE WHEN (some_column_name = c) THEN 1 END>,
# Column<CASE WHEN (some_column_name = b) THEN 1 END>]

Now you can use pyspark.sql.functions.coalesce() inside of a select:

df = df.select("*", f.coalesce(*some_map_func).alias("some_column_name"))

This works because when() returns null by default if the condition is not met, and coalesce() will pick the first non-null value it encounters. Since the keys of the map are unique, at most one column will be non-null.

2 of 2

You have a spark dataframe, not a pandas dataframe. To add new column to the spark dataframe:

import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType
df = df.withColumn('new_column', F.udf(some_map.get, IntegerType())(some_column_name))
df.show()

reddit.com › r/learnpython › attributeerror: 'dataframe' object has no attribute 'data'

r/learnpython on Reddit: AttributeError: 'DataFrame' object has no attribute 'data'

September 29, 2021 -

wine = pd.read_csv("combined.csv", header=0).iloc[:-1]
df = pd.DataFrame(wine)
df
dataset = pd.DataFrame(df.data, columns =df.feature_names)
dataset['target']=df.target
dataset

ERROR:

<ipython-input-27-64122078da92> in <module>
----> 1 dataset = pd.DataFrame(df.data, columns =df.feature_names)
      2 dataset['target']=df.target
      3 dataset

D:\Anaconda\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5463             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5464                 return self[name]
-> 5465             return object.__getattribute__(self, name)
   5466 
   5467     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'data'

I'm trying to set up a target to proceed with my Multi Linear Regression Project, but I can't even do that. I've already downloaded the CSV file and have it uploaded on a Jupyter Notebook. What I'm I doing wrong?

Top answer

1 of 3

I would recommend you start off by reading the 10min pandas Quick Guide . This should get you on the right track (at the very least you will be able to load data into pandas). Afterwards, you should also be able to tell why your current code doesn't make much sense.

2 of 3

Is there a column in your file called data? you already have a df from the first line. Maybe you just want to rename the columns. df.rename(columns={})

Kanaries

docs.kanaries.net › topics › Matplotlib › matplotlib-has-no-attribute-plot

Troubleshooting: 'Module Matplotlib Has No Attribute Plot' in Python – Kanaries

May 2, 2023 - Your complete guide to solving the 'module matplotlib has no attribute plot' error in Python, covering both installation and syntax issues with detailed examples.

Python.org

discuss.python.org › python help

AttributeError: 'DataFrame' object has no attribute 'Close' - Python Help - Discussions on Python.org

May 2, 2021 - Hi. As a newbie in Python, i dont understand what this error message means, because code has no “close” term… how do I fix the code in order for the graph to display? Thank you. (Please see attached.) def make_graph(stock_data, revenue_data, stock): fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=(“Historical Share Price”, “Historical Revenue”), vertical_spacing = .3) fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data.Date, infer_datetime_format=True), y=stock_data.Cl...

GitHub

github.com › Nixtla › statsforecast › discussions › 779

Input dataframe for statsforecast · Nixtla/statsforecast · Discussion #779

February 9, 2024 - I converted it to a pyspark.pandas.frame.DataFrame and it looks good. However, when i apply the function StatsForecast.plot(pd_df) (being pd_df my pandas dataframe), i get the following error: AttributeError: 'NoneType' object has no attribute 'Date' Function StatsForecast and forecast method work ok.

Author Nixtla

Stack Exchange

datascience.stackexchange.com › questions › 90606 › attributeerror-dataframe-object-has-no-attribute-values

pandas - AttributeError: 'DataFrame' object has no attribute 'Values' - Data Science Stack Exchange

Top answer

1 of 2

You tried to use .Values with a capital v instead of .values. Changing the capital v to a lowercase v should solve fix the error you're getting.

2 of 2

Pandas dataframe object does not have any var as Values, rather it is values with small "v". For further reference you can refer pandas documentation -

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html