The function pd.read_csv() is already a DataFrame and thus that kind of object does not support calling .to_dataframe().
You can check the type of your variable ds using print(type(ds)), you will see that it is a pandas DataFrame type.
The function pd.read_csv() is already a DataFrame and thus that kind of object does not support calling .to_dataframe().
You can check the type of your variable ds using print(type(ds)), you will see that it is a pandas DataFrame type.
According to what I understand. You are loading loanapp_c.csv in ds using this code:
ds = pd.read_csv('desktop/python ML/loanapp_c.csv')
ds over here is a DataFrame object. What you are doing is calling to_dataframe on an object which a DataFrame already.
Removing this dataset = ds.to_dataframe() from your code should solve the error
Check your DataFrame with data.columns
It should print something like this
Index([u'regiment', u'company', u'name',u'postTestScore'], dtype='object')
Check for hidden white spaces..Then you can rename with
data = data.rename(columns={'Number ': 'Number'})
I think the column name that contains "Number" is something like " Number" or "Number ". I'm assuming you might have a residual space in the column name. Please run print "<{}>".format(data.columns[1]) and see what you get. If it's something like < Number>, it can be fixed with:
data.columns = data.columns.str.strip()
See pandas.Series.str.strip
In general, AttributeError: 'DataFrame' object has no attribute '...', where ... is some column name, is caused because . notation has been used to reference a nonexistent column name or pandas method.
pandas methods are accessed with a .. pandas columns can also be accessed with a . (e.g. data.col) or with brackets (e.g. ['col'] or [['col1', 'col2']]).
data.columns = data.columns.str.strip() is a fast way to quickly remove leading and trailing spaces from all column names. Otherwise verify the column or attribute is correctly spelled.
dict to data frame with pandas ('list' object has no attribute 'values)
[Bug] Dataset .to_pandas() throws
python - I got the following error : 'DataFrame' object has no attribute 'data' - Data Science Stack Exchange
AttributeError: 'DataFrame' object has no attribute 'to_spark'
Videos
when you put .show() at the end, it is not a pyspark data frame anymore.
Remove it and it should work.
tx_ecommerce =tx_df.filter(tx_df["POS_Cardholder_Presence"]=="ECommerce")
tx_ecommerce.toPandas()
you can do this to read a parquet file:
import pandas as pd
txt = pd.read_parquet("/data/file.parquet")
txt_ecommerce = txt.loc[txt.POS_Cardholder_Presence =="ECommerce"]
Hey guys, I am learning how to convert the dictionary to data frame. I have a nested dictionary called user_dict like this:
File of dictionary in pickle format
[{'1000003': {'car': 0.0, 'car_passenger': 0.0, 'pt': 0.0, 'walk': 0.0, 'bike': 0.0}}, {'1000007': {'car': 0.0, 'car_passenger': 0.0, 'pt': 856.0786277323101, 'walk': 2546.869189662443, 'bike': 0.0}},
{'1000008': {'car': 0.0, 'car_passenger': 34189.569164682835, 'pt': 0.0, 'walk': 0.0, 'bike': 0.0}},
{'1000009': {'car': 0.0, 'car_passenger': 0.0, 'pt': 0.0, 'walk': 0.0, 'bike': 9847.472668350396}}]I want to convert the dict to data frame like this:
car car_passenger pt walk bike 1000003 0.0 0.0 0.0 0.0 0.0 1000007 0.0 0.0 856.078 2546.869 0.0 1000008 0.0 34189.569 0.0 0.0 0.0 1000009 0.0 0.0 0.0 0.0 9847.472
I converted it through from_dict:
df =pd.DataFrame.from_dict(user_dict,orient='index') df
But I got an error as this:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-44-2ef0fc236180> in <module>
----> 1 df =pd.DataFrame.from_dict(user_dict,orient='index')
2 df
/Library/Python/3.7/site-packages/pandas/core/frame.py in from_dict(cls, data, orient, dtype, columns)
1361 if len(data) > 0:
1362 # TODO speed up Series case
-> 1363 if isinstance(list(data.values())[0], (Series, dict)):
1364 data = _from_nested_dict(data)
1365 else:
AttributeError: 'list' object has no attribute 'values'I do not know how to fix it. Can anyone help me or explain me how to fix it?
Any help is appreciated.
"sklearn.datasets" is a scikit package, where it contains a method load_iris().
load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.
Whereas 'iris.csv', holds feature and target together.
FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.
from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
The Iris Dataset from Sklearn is in Sklearn's Bunch format:
print(type(iris))
print(iris.keys())
output:
<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
So, that's why you can access it as:
x=iris.data
y=iris.target
But when you read the CSV file as DataFrame as mentioned by you:
iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()
output is:
2 3
0 petal_length petal_width
1 1.4 0.2
2 1.4 0.2
3 1.3 0.2
4 1.5 0.2
Here the column names are '1' and '2'.
First of all you should read the CSV file as:
df = pd.read_csv('iris.csv')
you should not include header=None as your csv file includes the column names i.e. the headers.
So, now what you can do is something like this:
X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'
or if you want to use the column names then:
X = df[['petal_length', 'petal_width']]
y = df.iloc['species']
Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)