personal favorite way:
df.column_name.value_counts() / len(df)
Gives a series with the column's values as the index and the proportions as the values.
Answer from Freestyle076 on Stack Overflowpersonal favorite way:
df.column_name.value_counts() / len(df)
Gives a series with the column's values as the index and the proportions as the values.
This is a generalized solution which doesn't alter the table or does any kind of filtering or transformation before using groupby.
> s = df_test.groupby(['A'])['B'].value_counts(normalize=True)
> print(s)
A B
a Y 0.666667
N 0.333333
b N 0.500000
Y 0.500000
Name: B, dtype: float64
Above variable s is a multi-index series and you can access any rows using .loc
> s.loc[:,'Y']
A
a 0.666667
b 0.500000
Name: B, dtype: float64
Similarly, you can access the details about 'N' using the same series.
> s.loc[:,'N']
A
a 0.333333
b 0.500000
Name: B, dtype: float64
PS: If you want to understand groupby better then try to decode this code which is exactly similar of above but only alters the column names and results differnetly.
> r = df_test.groupby(['B'])['A'].value_counts(normalize=True)
> print(r)
B A
N a 0.500000
b 0.500000
Y a 0.666667
b 0.333333
Name: A, dtype: float64
and
> r.loc['Y',:]
B A
Y a 0.666667
b 0.333333
Name: A, dtype: float64
Videos
You can use GroupBy.value_counts with normalize=True, and reshaping:
(df
.groupby(['A', 'B'])['C']
.value_counts(normalize=True)
.unstack('C', fill_value=0)
.reset_index()
)
output:
C A B 0 1 2 3
0 x i 0.5 0.5 0.0 0.0
1 x j 0.0 1.0 0.0 0.0
2 y j 0.5 0.0 0.5 0.0
3 y k 0.0 0.0 0.0 1.0
4 z k 0.5 0.0 0.0 0.5
You can use pd.crosstab, which is just a wrapper for groupby:
pd.crosstab([df['A'], df['B']], df['C'], normalize='index')
Output:
C 0 1 2 3
A B
x i 0.5 0.5 0.0 0.0
j 0.0 1.0 0.0 0.0
y j 0.5 0.0 0.5 0.0
k 0.0 0.0 0.0 1.0
z k 0.5 0.0 0.0 0.5
Check below code:
import pandas as pd
df = pd.DataFrame({'col1':[1,1,1,2,3,3],'col2':['b','a','a','a','a','b']})
df['perc'] = df.groupby(['col1','col2'])['col2'].transform('count')/df.groupby('col1')['col2'].transform('count')
df.round(2).drop_duplicates()
Output:

you also can do something like this:
res = df.groupby('id').value_counts(normalize=True).reset_index(name='perc')
print(res)
'''
id value_type perc
0 1 a 0.666667
1 1 b 0.333333
2 2 a 1.000000
3 3 a 0.500000
4 3 b 0.500000
groupby and apply transform to get the mean
df['num_true']=df.groupby('id').col1.transform('mean')
id col1 num_true
0 1 True 0.75
1 1 True 0.75
2 1 False 0.75
3 1 True 0.75
4 2 False 0.00
5 2 False 0.00
Here is the asked code:
import pandas as pd
df = pd.DataFrame({"col1": [True,True,False,True,False,False]}, index = [1,1,1,1,2,2])
grouped_df = df.groupby(df.index)
df["num_true"] = grouped_df.sum() / grouped_df.count()
What I did here is to group the dataframe by the index, After that, I sum the number of "True" values and divide it by the total number of values.
Result:
col1 num_true
1 True 0.75
1 True 0.75
1 False 0.75
1 True 0.75
2 False 0.00
2 False 0.00