DataFrames do not have that method; columns in DataFrames do:
df['A'].unique()
Or, to get the names with the number of observations (using the DataFrame given by closedloop):
>>> df.groupby('person').person.count()
Out[80]:
person
0 2
1 3
Name: person, dtype: int64
Answer from Alexander on Stack OverflowDataFrames do not have that method; columns in DataFrames do:
df['A'].unique()
Or, to get the names with the number of observations (using the DataFrame given by closedloop):
>>> df.groupby('person').person.count()
Out[80]:
person
0 2
1 3
Name: person, dtype: int64
Rather than removing duplicates during the pivot table process, use the df.drop_duplicates() function to selectively drop duplicates.
For example if you are pivoting using these index='c0' and columns='c1' then this simple step yields the correct counts.
In this example the 5th row is a duplicate of the 4th (ignoring the non-pivoted c2 column
import pandas as pd
data = {'c0':[0,1,0,1,1], 'c1':[0,0,1,1,1], 'person':[0,0,1,1,1], 'c_other':[1,2,3,4,5]}
df = pd.DataFrame(data)
df2 = df.drop_duplicates(subset=['c0','c1','person'])
pd.pivot_table(df2, index='c0',columns='c1',values='person', aggfunc='count')
This correctly outputs
c1 0 1
c0
0 1 1
1 1 1
[BUG] AttributeError: 'DataFrame' object has no attribute 'unique'
python - Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe - Stack Overflow
AttributeError: 'DataFrame' object has no attribute 'unique'
Pandas unique doest not work on groupby object when applied on several columns - Stack Overflow
The DataFrame object doesn't have nunique, only Series do. You have to pick out which column you want to apply nunique() on. You can do this with a simple dot operator:
df.groupby('A').apply(lambda x: x.B.nunique())
will print:
A
bar 2
flux 2
foo 3
And doing:
df.groupby('A').apply(lambda x: x.E.nunique())
will print:
A
bar 1
flux 2
foo 2
Alternatively you can do this with one function call using:
df.groupby('A').aggregate({'B': lambda x: x.nunique(), 'E': lambda x: x.nunique()})
which will print:
B E
A
bar 2 1
flux 2 2
foo 3 2
To answer your question about why your recursive lambda prints the A column as well, it's because when you do a groupby/apply operation, you're now iterating through three DataFrame objects. Each DataFrame object is a sub-DataFrame of the original. Applying an operation to that will apply it to each Series. There are three Series per DataFrame you're applying the nunique() operator to.
The first Series being evaluated on each DataFrame is the A Series, and since you've done a groupby on A, you know that in each DataFrame, there is only one unique value in the A Series. This explains why you're ultimately given an A result column with all 1's.
I encountered the same problem. Upgrading pandas to the latest version solved the problem for me.
df.groupby('A').nunique()
The above code did not work for me in Pandas version 0.19.2. I upgraded it to Pandas version 0.21.1 and it worked.
You can check the version using the following code:
print('Pandas version ' + pd.__version__)
Reason of error is unique working only for Series, so is only implemented SeriesGroupBy.unique.
For me working Series.unique with convert to list:
df = df.groupby('group')[['param1', 'param2']].agg(lambda x: list(x.unique()))
print (df)
param1 param2
group
1 [1.0, 5.0] [5, 6]
2 [8.0] [9]
3 [nan, 2.0, 3.0] [10, 11, 12]
4 [nan] [1]
df = df.groupby('group').agg({'param1': 'unique',
'param2': 'unique'})
print(df)
param1 param2
group
1 [1.0, 5.0] [5, 6]
2 [8.0] [9]
3 [nan, 2.0, 3.0] [10, 11, 12]
4 [nan] [1]
Hello,
Has anyone ever come across this before?
I'm trying to group some data in a dataframe and getting this error. The steps I've taken are:
-
in a for loop:
read in a csv from an api using pd.read_csv() replaced some values in a column using a for loop and .loc[] appended the resulting data frame to a list
2) concatenated the list of dataframes using pd.concat()
3) added a calculated column to the new DF by multiplying another column
4) added two empty columns
5) filtered the DF using .loc[] based on a value within a column
6) filtered the DF using .loc[] based on a value in a different column
7) tried to use this code:
new_DF = old_df.group_by(['col1', 'col_2', 'col_3', 'adgroup', 'col_4', 'col5', 'col6'], as_index=False)[['col7', 'col8', 'col9']].sum()
The DF seems to behaving normally for example I can do dtypes and columns on it and add columns which are calculated from other columns. What is super frustrating is that I can do pd.to_csv() and then pd.read_csv() on the DF and then I'm able to do the grouping I want (however this isn't ideal which is why I'm posting).
Any advice would be appreciated.
Cheers