On checking I found my polars version :
pl.__version__
0.17.3
https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.groupby.html
I need to do:
df.groupby("a").agg(pl.col("b").sum()) # there is no underscore in groupby
#output
shape: (3, 2)
a b
str i64
"a" 2
"c" 3
"b" 5
and the document says :
Deprecated since version 0.19.0: This method has been renamed to
DataFrame.group_by().
This is the new document for polars version 0.19
https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.group_by.html#polars-dataframe-group-by
Answer from Talha Tayyab on Stack OverflowHello,
Has anyone ever come across this before?
I'm trying to group some data in a dataframe and getting this error. The steps I've taken are:
-
in a for loop:
read in a csv from an api using pd.read_csv() replaced some values in a column using a for loop and .loc[] appended the resulting data frame to a list
2) concatenated the list of dataframes using pd.concat()
3) added a calculated column to the new DF by multiplying another column
4) added two empty columns
5) filtered the DF using .loc[] based on a value within a column
6) filtered the DF using .loc[] based on a value in a different column
7) tried to use this code:
new_DF = old_df.group_by(['col1', 'col_2', 'col_3', 'adgroup', 'col_4', 'col5', 'col6'], as_index=False)[['col7', 'col8', 'col9']].sum()
The DF seems to behaving normally for example I can do dtypes and columns on it and add columns which are calculated from other columns. What is super frustrating is that I can do pd.to_csv() and then pd.read_csv() on the DF and then I'm able to do the grouping I want (however this isn't ideal which is why I'm posting).
Any advice would be appreciated.
Cheers
BUG AttributeError: 'DataFrameGroupBy' object has no attribute '_obj_with_exclusions'
python - AttributeError: 'DataFrame' object has no attribute 'group_by' - Stack Overflow
Dataframe groupby()[column].agg fails with AttributeError
AttributeError: 'DataFrame' object has no attribute 'name'
the solution is to use a loc to set the values, rather than creating a copy.
creating a copy of df loses the name:
df = df[::-1] # creates a copy
setting the value 'keeps' the original object intact, along with name
df.loc[:] = df[:, ::-1] # reversal maintaining the original object
Example code that reverses values along the column axis:
df = pd.DataFrame([[6,10]], columns=['a','b'])
df.name='t'
print(df.name)
print(df)
df.iloc[:] = df.iloc[:,::-1]
print(df)
print(df.name)
outputs:
t
a b
0 6 10
a b
0 10 6
t
A workaround is to set a columns.name and use it when needed.
Example:
df = pd.DataFrame()
df.columns.name = 'name'
print(df.columns.name)
name
It claims that the error revolves around df = df[df.name != "gif"], even though rows with those characters are exactly what I'm trying to delete.
import pandas as pd
df = pd.read_csv("Output Configured 3 edit.csv")
df = df[df.name != "gif"]
# df.column_name != whole string from the cell
# now, all the rows with the column: Name and Value: "dog" will be deleted
df.to_csv(file, index=False)