value_counts is a Series method rather than a DataFrame method (and you are trying to use it on a DataFrame, clean). You need to perform this on a specific column:
clean[column_name].value_counts()
It doesn't usually make sense to perform value_counts on a DataFrame, though I suppose you could apply it to every entry by flattening the underlying values array:
pd.value_counts(df.values.flatten())
Answer from Andy Hayden on Stack Overflowvalue_counts is a Series method rather than a DataFrame method (and you are trying to use it on a DataFrame, clean). You need to perform this on a specific column:
clean[column_name].value_counts()
It doesn't usually make sense to perform value_counts on a DataFrame, though I suppose you could apply it to every entry by flattening the underlying values array:
pd.value_counts(df.values.flatten())
To get all the counts for all the columns in a dataframe, it's just df.count()
Errors after deploying the app
python - Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe - Stack Overflow
Pandas AttributeError: 'DataFrame' object has no attribute 'group_by'
I mean, isn't it groupby(), not group_by()?
More on reddit.compython 3.x - How to count_values and then groupby - Stack Overflow
Hello,
Has anyone ever come across this before?
I'm trying to group some data in a dataframe and getting this error. The steps I've taken are:
-
in a for loop:
read in a csv from an api using pd.read_csv() replaced some values in a column using a for loop and .loc[] appended the resulting data frame to a list
2) concatenated the list of dataframes using pd.concat()
3) added a calculated column to the new DF by multiplying another column
4) added two empty columns
5) filtered the DF using .loc[] based on a value within a column
6) filtered the DF using .loc[] based on a value in a different column
7) tried to use this code:
new_DF = old_df.group_by(['col1', 'col_2', 'col_3', 'adgroup', 'col_4', 'col5', 'col6'], as_index=False)[['col7', 'col8', 'col9']].sum()
The DF seems to behaving normally for example I can do dtypes and columns on it and add columns which are calculated from other columns. What is super frustrating is that I can do pd.to_csv() and then pd.read_csv() on the DF and then I'm able to do the grouping I want (however this isn't ideal which is why I'm posting).
Any advice would be appreciated.
Cheers
Good evening,
I've been trying to work with this data.
data = {
'ID': [1,1,1,2,2,3,3,3,3,4,4],
'Value': [23,22,11,98,34,42,30,36,77,58,5]
}
df = pd.DataFrame(data)
df
ID Value
0 1 23
1 1 22
2 1 11
3 2 98
4 2 34
5 3 42
6 3 30
7 3 36
8 3 77
9 4 58
10 4 5Now for this, I'm trying to keep only the last two rows for each ID. Before I share my code, here's my intended result:
1 1 22 2 1 11 3 2 98 4 2 34 7 3 36 8 3 77 9 4 58 10 4 5
The code I tried to use is this (along with resulting error):
for i, row in df.iterrows():
while row['ID'].value_counts() > 2:
df.drop(i)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-286-15a45b0e4fb7> in <module>
1 for i, row in df.iterrows():
----> 2 while row['ID'].value_counts() > 2:
3 df.drop(i)
AttributeError: 'numpy.int64' object has no attribute 'value_cI know iterrows is really frowned upon, but speed isn't an issue and I'm still practicing my data modeling skills and I hope to drop it once I get more experienced. I feel like this code executes without the series, but as you can see, I want them grouped by ID.
Thanks!
You tried to use .Values with a capital v instead of .values. Changing the capital v to a lowercase v should solve fix the error you're getting.
Pandas dataframe object does not have any var as Values, rather it is values with small "v". For further reference you can refer pandas documentation -
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html
"sklearn.datasets" is a scikit package, where it contains a method load_iris().
load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.
Whereas 'iris.csv', holds feature and target together.
FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.
from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
The Iris Dataset from Sklearn is in Sklearn's Bunch format:
print(type(iris))
print(iris.keys())
output:
<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
So, that's why you can access it as:
x=iris.data
y=iris.target
But when you read the CSV file as DataFrame as mentioned by you:
iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()
output is:
2 3
0 petal_length petal_width
1 1.4 0.2
2 1.4 0.2
3 1.3 0.2
4 1.5 0.2
Here the column names are '1' and '2'.
First of all you should read the CSV file as:
df = pd.read_csv('iris.csv')
you should not include header=None as your csv file includes the column names i.e. the headers.
So, now what you can do is something like this:
X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'
or if you want to use the column names then:
X = df[['petal_length', 'petal_width']]
y = df.iloc['species']
Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)