"sklearn.datasets" is a scikit package, where it contains a method load_iris().

load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.

Whereas 'iris.csv', holds feature and target together.

FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.

from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
Answer from vipin bansal on Stack Exchange
🌐
Stack Overflow
stackoverflow.com › questions › 61570636 › how-can-i-fix-data-frame-has-no-attribute-plot
python - How can I fix data frame has no attribute plot - Stack Overflow
May 3, 2020 - dd=df.select(df.Color,df.ListPrice.cast("float")) colordf = dd[['Color','ListPrice']] colordfgroup = colordf.groupby('Color').mean('ListPrice') colordfgroup.show() my_plot = colordfgroup.plot(kind...
🌐
Cloudera Community
community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › td-p › 78093
Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'
January 2, 2024 - #%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ops').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv',inferSchema=True,header=True) df.createOrReplaceTempView('Person_Person') myresults = spark.sql("""SELECT PersonType ,COUNT(PersonType) AS `Person Count` FROM Person_Person GROUP BY PersonType""") myresults.collect() result = myresults.collect() result result.saveAsTextFile("test") However, I'm now getting the following error message: AttributeError: 'list' object has no attribute 'saveAsTextFile'
Discussions

pyspark - DataFrame' object has no attribute 'get - Stack Overflow
I would like to plot y data frame but I had an error like below. Any suggestions? g1= sns.scatterplot( x= "events", y= "ways" , data = x2) Output message DataFrame' object has... More on stackoverflow.com
🌐 stackoverflow.com
pandas - PySpark : AttributeError: 'DataFrame' object has no attribute 'values' - Stack Overflow
22 ---> 23 api_param_df = ... api_param_df], axis=1) 25 /usr/local/lib/python3.7/dist-packages/pyspark/sql/dataframe.py in __getattr__(self, name) 1642 if name not in self.columns: 1643 raise AttributeError( -> 1644 "'%s' object has no attribute '%s'" % (self.__class__.__name__, ... More on stackoverflow.com
🌐 stackoverflow.com
python 3.x - Pandas Dataframe has no Plot function - Stack Overflow
Communities for your favorite technologies. Explore all Collectives · Ask questions, find answers and collaborate at work with Stack Overflow for Teams More on stackoverflow.com
🌐 stackoverflow.com
python - Data frame object has no attribute "plt" - Stack Overflow
I am trying to do some machine learning work in python but i cant plot anything using matplotlib. When i try to run the program, i get the error as shown in the screenshot. error Here is the code ... More on stackoverflow.com
🌐 stackoverflow.com
Top answer
1 of 5
2

"sklearn.datasets" is a scikit package, where it contains a method load_iris().

load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.

Whereas 'iris.csv', holds feature and target together.

FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.

from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
2 of 5
1

The Iris Dataset from Sklearn is in Sklearn's Bunch format:

print(type(iris))
print(iris.keys())

output:

<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

So, that's why you can access it as:

x=iris.data
y=iris.target

But when you read the CSV file as DataFrame as mentioned by you:

iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()

output is:

    2   3
0   petal_length    petal_width
1   1.4 0.2
2   1.4 0.2
3   1.3 0.2
4   1.5 0.2

Here the column names are '1' and '2'.

First of all you should read the CSV file as:

df = pd.read_csv('iris.csv')

you should not include header=None as your csv file includes the column names i.e. the headers.

So, now what you can do is something like this:

X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'

or if you want to use the column names then:

X = df[['petal_length', 'petal_width']]
y = df.iloc['species']

Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)
🌐
Oceanhackweek
oceanhackweek.org › ohw22 › tutorials › 02-Wed › 01-data-visualization-in-python › tutorial › 04_Basic_Plotting.html
Read in the data — OceanHackWeek
If you are familiar with the pandas.plot API, you might expect to execute df.plot.scatter(x='longitude', y='latitude'). Feel free to try this out in a new cell, but it will throw an error: AttributeError: 'DataFrame' object has no attribute 'plot'. In order to make the data more manageable for now, we’ll briefly use just a fraction (1%) of it and call that small_df.
🌐
Apache
spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › frame.html
DataFrame — PySpark 4.1.1 documentation - Apache Spark
DataFrame.plot is both a callable method and a namespace attribute for specific plotting methods of the form DataFrame.plot.<kind>.
Find elsewhere
🌐
Python Forum
python-forum.io › thread-33991.html
AttributeError: 'DataFrame' object has no attribute 'Articles'
Purposes I want to plot feathers importance for data prediction and training and testing Running Time Error: AttributeError: 'DataFrame' object has no attribute 'Articles' Error:Traceback (most recent call last): File 'D:/Clustering/text-cluster...
🌐
Databricks Community
community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute › td-p › 61132
AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 61132
February 19, 2024 - Hello, I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message: "AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'" Here is part of the code: @dl...
🌐
Cumulative Sum
cumsum.wordpress.com › 2020 › 10 › 10 › pyspark-attributeerror-dataframe-object-has-no-attribute-_get_object_id
[pyspark] AttributeError: ‘DataFrame’ object has no attribute ‘_get_object_id’
October 10, 2020 - AttributeError: ‘DataFrame’ object has no attribute ‘_get_object_id’ · The reason being that isin expects actual local values or collections but df2.select('id') returns a data frame.
🌐
Spark By {Examples}
sparkbyexamples.com › home › hbase › attributeerror: ‘dataframe’ object has no attribute ‘map’ in pyspark
AttributeError: 'DataFrame' object has no attribute 'map' in PySpark - Spark By {Examples}
March 27, 2024 - PySpark DataFrame doesn’t have a map() transformation instead it’s present in RDD hence you are getting the error AttributeError: ‘DataFrame’ object has no attribute ‘map’
Top answer
1 of 2
7

The syntax you are using is for a pandas DataFrame. To achieve this for a spark DataFrame, you should use the withColumn() method. This works great for a wide range of well defined DataFrame functions, but it's a little more complicated for user defined mapping functions.

General Case

In order to define a udf, you need to specify the output data type. For instance, if you wanted to apply a function my_func that returned a string, you could create a udf as follows:

import pyspark.sql.functions as f
my_udf = f.udf(my_func, StringType())

Then you can use my_udf to create a new column like:

df = df.withColumn('new_column', my_udf(f.col("some_column_name")))

Another option is to use select:

df = df.select("*", my_udf(f.col("some_column_name")).alias("new_column"))

Specific Problem

Using a udf

In your specific case, you want to use a dictionary to translate the values of your DataFrame.

Here is a way to define a udf for this purpose:

some_map_udf = f.udf(lambda x: some_map.get(x, None), IntegerType())

Notice that I used dict.get() because you want your udf to be robust to bad inputs.

df = df.withColumn('new_column', some_map_udf(f.col("some_column_name")))

Using DataFrame functions

Sometimes using a udf is unavoidable, but whenever possible, using DataFrame functions is usually preferred.

Here is one option to do the same thing without using the udf.

The trick is to iterate over the items in some_map to create a list of pyspark.sql.functions.when() functions.

some_map_func = [f.when(f.col("some_column_name") == k, v) for k, v in some_map.items()]
print(some_map_func)
#[Column<CASE WHEN (some_column_name = a) THEN 0 END>,
# Column<CASE WHEN (some_column_name = c) THEN 1 END>,
# Column<CASE WHEN (some_column_name = b) THEN 1 END>]

Now you can use pyspark.sql.functions.coalesce() inside of a select:

df = df.select("*", f.coalesce(*some_map_func).alias("some_column_name"))

This works because when() returns null by default if the condition is not met, and coalesce() will pick the first non-null value it encounters. Since the keys of the map are unique, at most one column will be non-null.

2 of 2
1

You have a spark dataframe, not a pandas dataframe. To add new column to the spark dataframe:

import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType
df = df.withColumn('new_column', F.udf(some_map.get, IntegerType())(some_column_name))
df.show()
🌐
Reddit
reddit.com › r/learnpython › attributeerror: 'dataframe' object has no attribute 'data'
r/learnpython on Reddit: AttributeError: 'DataFrame' object has no attribute 'data'
September 29, 2021 -
wine = pd.read_csv("combined.csv", header=0).iloc[:-1]
df = pd.DataFrame(wine)
df
dataset = pd.DataFrame(df.data, columns =df.feature_names)
dataset['target']=df.target
dataset

ERROR:

<ipython-input-27-64122078da92> in <module>
----> 1 dataset = pd.DataFrame(df.data, columns =df.feature_names)
      2 dataset['target']=df.target
      3 dataset

D:\Anaconda\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5463             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5464                 return self[name]
-> 5465             return object.__getattribute__(self, name)
   5466 
   5467     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'data'

I'm trying to set up a target to proceed with my Multi Linear Regression Project, but I can't even do that. I've already downloaded the CSV file and have it uploaded on a Jupyter Notebook. What I'm I doing wrong?

🌐
Kanaries
docs.kanaries.net › topics › Matplotlib › matplotlib-has-no-attribute-plot
Troubleshooting: 'Module Matplotlib Has No Attribute Plot' in Python – Kanaries
May 2, 2023 - Your complete guide to solving the 'module matplotlib has no attribute plot' error in Python, covering both installation and syntax issues with detailed examples.
🌐
Python.org
discuss.python.org › python help
AttributeError: 'DataFrame' object has no attribute 'Close' - Python Help - Discussions on Python.org
May 2, 2021 - Hi. As a newbie in Python, i dont understand what this error message means, because code has no “close” term… how do I fix the code in order for the graph to display? Thank you. (Please see attached.) def make_graph(stock_data, revenue_data, stock): fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=(“Historical Share Price”, “Historical Revenue”), vertical_spacing = .3) fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data.Date, infer_datetime_format=True), y=stock_data.Cl...
🌐
GitHub
github.com › Nixtla › statsforecast › discussions › 779
Input dataframe for statsforecast · Nixtla/statsforecast · Discussion #779
February 9, 2024 - I converted it to a pyspark.pandas.frame.DataFrame and it looks good. However, when i apply the function StatsForecast.plot(pd_df) (being pd_df my pandas dataframe), i get the following error: AttributeError: 'NoneType' object has no attribute 'Date' Function StatsForecast and forecast method work ok.
Author   Nixtla