In [20]: import pandas as pd
...: import numpy as np
...:
...: df = pd.DataFrame(np.arange(5))
In [21]:
In [21]: df.dtypes.value_counts()
Out[21]:
int64 1
dtype: int64
.get_dtype_counts() is deprecated since version 0.25.0

In [20]: import pandas as pd
...: import numpy as np
...:
...: df = pd.DataFrame(np.arange(5))
In [21]:
In [21]: df.dtypes.value_counts()
Out[21]:
int64 1
dtype: int64
.get_dtype_counts() is deprecated since version 0.25.0

.get_dtype_counts() is deprecated since 0.25.0, use .dtypes.value_counts() instead
"sklearn.datasets" is a scikit package, where it contains a method load_iris().
load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.
Whereas 'iris.csv', holds feature and target together.
FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.
from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
The Iris Dataset from Sklearn is in Sklearn's Bunch format:
print(type(iris))
print(iris.keys())
output:
<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
So, that's why you can access it as:
x=iris.data
y=iris.target
But when you read the CSV file as DataFrame as mentioned by you:
iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()
output is:
2 3
0 petal_length petal_width
1 1.4 0.2
2 1.4 0.2
3 1.3 0.2
4 1.5 0.2
Here the column names are '1' and '2'.
First of all you should read the CSV file as:
df = pd.read_csv('iris.csv')
you should not include header=None as your csv file includes the column names i.e. the headers.
So, now what you can do is something like this:
X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'
or if you want to use the column names then:
X = df[['petal_length', 'petal_width']]
y = df.iloc['species']
Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)
df[col] will return a DataFrame object (not a Series) if df contains more than one column with the same name. Make sure you do not have more than one column with the same name in df as that is the likely source of this error.
You seem to somehow get back a DataFrame and not a Series by calling X[col]. Not sure why, because you did not supply the full structure and data of your dataframe.
.dtype is for pandas Series https://pandas.pydata.org/docs/reference/api/pandas.Series.dtype.html
.dtypes is for pandas Dataframes (and seems also to work with pandas Series) https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dtypes.html