I think I understood what you wanted to do and what you did not understand (mainly about the way to modifiy objects with pandas). I assume that you wanted to:
- compute your aggregation by payment date in
data - and then set its index to 'Payment date' field
Short answer: if you want to have this result into data, simply execute:
data = data.groupby('Payment date ')['Payment amount'].sum().to_frame()
'Payment date ' will be your new index, to_frame prevents your single column resulting dataframe to be squeezed into a pandas Series (which I think was your first intention to avoid, resetting your index to then set it back).
Let's dive into your code.
First line
data = data.groupby('Payment Date ')
First line is ok, but might not do exactly what you want. You are taking data, which I assume is a pandas DataFrame and reaffect it a pandas DataFrameGroupBy object. This kind of object does not hold any data, you can see it simply as a mapping between index(s) of your original DataFrame and associated groups (here, payment dates).
Anyway, you got your groupby object into data.
Second line
data['Payment Amount '].sum().reset_index()
This line does nothing. It shows the result of the computation in your Jupyter notebook, but nothing has been changed in data. data is still the same DataFrameGroupBy object.
Third line
data = data.set_index('Payment Date ', inplace = True)
An exception is raised, saying that a DataFrameGroupBy objet has no set_index method. This is because data has not been changed by your second line of code.
Even so, I would encourage you to avoid using inplace=True anytime in your code. You should always go with explicit reassignements.
Your code could look like (if you don't like the short answer above):
data = data.groupby('Payment date ')
data = data['Payment amount'].sum().reset_index()
data = data.set_index('Payment date ') # No inplace=True!
Answer from Pierre Massé on Stack Overflowpython - DataFrameGroupBy' object has no attribute 'reset_index'? - Stack Overflow
BUG AttributeError: 'DataFrameGroupBy' object has no attribute '_obj_with_exclusions'
python 3.x - AttributeError: Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method - Stack Overflow
python - AttributeError: 'list' object has no attribute 'reset_index' - Stack Overflow
For groupby need some aggregation function(s), like mean, sum, max:
df.sort_values(['col5'],ascending=False).groupby('col1').mean().reset_index()
Or:
df.sort_values(['col5'],ascending=False).groupby('col1', as_index=False).mean()
You can try the below code, I had a similar issue.
grouped=data.groupby(['Colname'])
grouped.apply(lambda _df: _df.sort_values(by=['col_to_be_sorted']))
I have a dataframe that I've moved down to two columns 'last date read' and 'read count'. I formatted it to %Y-%m and then grouped it using the dates and summed the total amount of books read.
I wanted to plot it using month_calplot from plotly_calplot but I get the following error and I'm not sure where to go from here:
Error:
line 247, in month_calplot
gData = data.set_index(x)[y].groupby(Grouper(freq="M")).sum()
File "C:\Python\Python310\lib\site-packages\pandas\core\generic.py", line 5575, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'set_index'Code:
df2 = df2[['last date read', 'read count']]
df2['last date read'] = pd.to_datetime(df2['last date read'])
df2['last date read'] = df2['last date read'].dt.strftime('%Y-%m')
df2 = df2.groupby(['last date read'])['read count'].sum()
print(df2)
fig3 = month_calplot(
df2,
x='last date read',
y='read count',
colorscale="Purpor",
showscale=True,
total_height=250,
dark_theme=True)Hello,
Has anyone ever come across this before?
I'm trying to group some data in a dataframe and getting this error. The steps I've taken are:
-
in a for loop:
read in a csv from an api using pd.read_csv() replaced some values in a column using a for loop and .loc[] appended the resulting data frame to a list
2) concatenated the list of dataframes using pd.concat()
3) added a calculated column to the new DF by multiplying another column
4) added two empty columns
5) filtered the DF using .loc[] based on a value within a column
6) filtered the DF using .loc[] based on a value in a different column
7) tried to use this code:
new_DF = old_df.group_by(['col1', 'col_2', 'col_3', 'adgroup', 'col_4', 'col5', 'col6'], as_index=False)[['col7', 'col8', 'col9']].sum()
The DF seems to behaving normally for example I can do dtypes and columns on it and add columns which are calculated from other columns. What is super frustrating is that I can do pd.to_csv() and then pd.read_csv() on the DF and then I'm able to do the grouping I want (however this isn't ideal which is why I'm posting).
Any advice would be appreciated.
Cheers
"sklearn.datasets" is a scikit package, where it contains a method load_iris().
load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.
Whereas 'iris.csv', holds feature and target together.
FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.
from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
The Iris Dataset from Sklearn is in Sklearn's Bunch format:
print(type(iris))
print(iris.keys())
output:
<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
So, that's why you can access it as:
x=iris.data
y=iris.target
But when you read the CSV file as DataFrame as mentioned by you:
iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()
output is:
2 3
0 petal_length petal_width
1 1.4 0.2
2 1.4 0.2
3 1.3 0.2
4 1.5 0.2
Here the column names are '1' and '2'.
First of all you should read the CSV file as:
df = pd.read_csv('iris.csv')
you should not include header=None as your csv file includes the column names i.e. the headers.
So, now what you can do is something like this:
X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'
or if you want to use the column names then:
X = df[['petal_length', 'petal_width']]
y = df.iloc['species']
Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)