Update 2022-03

This answer by caner using transform looks much better than my original answer!

df['sales'] / df.groupby('state')['sales'].transform('sum')

Thanks to this comment by Paul Rougieux for surfacing it.

Original Answer (2014)

Paul H's answer is right that you will have to make a second groupby object, but you can calculate the percentage in a simpler way -- just groupby the state_office and divide the sales column by its sum. Copying the beginning of Paul H's answer:

# From Paul H
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                   'office_id': list(range(1, 7)) * 2,
                   'sales': [np.random.randint(100000, 999999)
                             for _ in range(12)]})
state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
# Change: groupby state_office and divide by sum
state_pcts = state_office.groupby(level=0).apply(lambda x:
                                                 100 * x / float(x.sum()))

Returns:

                     sales
state office_id           
AZ    2          16.981365
      4          19.250033
      6          63.768601
CA    1          19.331879
      3          33.858747
      5          46.809373
CO    1          36.851857
      3          19.874290
      5          43.273852
WA    2          34.707233
      4          35.511259
      6          29.781508
Answer from exp1orer on Stack Overflow
Top answer
1 of 16
396

Update 2022-03

This answer by caner using transform looks much better than my original answer!

df['sales'] / df.groupby('state')['sales'].transform('sum')

Thanks to this comment by Paul Rougieux for surfacing it.

Original Answer (2014)

Paul H's answer is right that you will have to make a second groupby object, but you can calculate the percentage in a simpler way -- just groupby the state_office and divide the sales column by its sum. Copying the beginning of Paul H's answer:

# From Paul H
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                   'office_id': list(range(1, 7)) * 2,
                   'sales': [np.random.randint(100000, 999999)
                             for _ in range(12)]})
state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
# Change: groupby state_office and divide by sum
state_pcts = state_office.groupby(level=0).apply(lambda x:
                                                 100 * x / float(x.sum()))

Returns:

                     sales
state office_id           
AZ    2          16.981365
      4          19.250033
      6          63.768601
CA    1          19.331879
      3          33.858747
      5          46.809373
CO    1          36.851857
      3          19.874290
      5          43.273852
WA    2          34.707233
      4          35.511259
      6          29.781508
2 of 16
102

(This solution is inspired from this article: Understanding the Transform Function in Pandas)

I find the following solution to be the simplest(and probably the fastest) using transformation:

Transformation: While aggregation must return a reduced version of the data, transformation can return some transformed version of the full data to recombine. For such a transformation, the output is the same shape as the input.

So using transformation, the solution is 1-liner:

df['%'] = 100 * df['sales'] / df.groupby('state')['sales'].transform('sum')

And if you print:

print(df.sort_values(['state', 'office_id']).reset_index(drop=True))

   state  office_id   sales          %
0     AZ          2  195197   9.844309
1     AZ          4  877890  44.274352
2     AZ          6  909754  45.881339
3     CA          1  614752  50.415708
4     CA          3  395340  32.421767
5     CA          5  209274  17.162525
6     CO          1  549430  42.659629
7     CO          3  457514  35.522956
8     CO          5  280995  21.817415
9     WA          2  828238  35.696929
10    WA          4  719366  31.004563
11    WA          6  772590  33.298509
๐ŸŒ
Medium
medium.com โ€บ data-science โ€บ 4-useful-tips-of-pandas-groupby-3744eefb1852
How to calculate percentage within groupby in Pandas? | by Andryw Marques | TDS Archive | Medium
December 26, 2022 - In most of the situations, we want to split the data into groups and do something with those groups. Usually Aggregation, Transformation, Filtration. I will use this toy dataframe of Sales to illustrate the tricks: ... This is the function I use most. Many times I use Groupby to summarize some values and I want to know what is the percentage of the values in each group, not in all the data.
Discussions

Pandas agg percent groupby of Groupby
I think what you want is this: import pandas as pd data = {'Col_1': ['A', 'A', 'B', 'B'], 'Col_2': ['Test1', 'Test2', 'Test5', 'Test7'], 'Values': [13.5, 31.5, 5, 45]} df = pd.DataFrame(data) sub_totals = df.groupby(['Col_1', 'Col_2']).agg(sum) df_new = sub_totals.groupby(level='Col_1').apply(lambda x: x * 100 / x.sum()) You have to chain two groupby. With the first groupby you grab all the tests in each category and calculate their sum. With the second one you look at the individual tests per category and calculate their percentage based on the subtotal you got from the first groupby. More on reddit.com
๐ŸŒ r/learnpython
3
2
June 10, 2021
python - Groupby count, then sum and get the percentage - Code Review Stack Exchange
I wrote this code. It works, but I think there is a more elegant and Pythonic way to this task. Groupby and count the different occurences Get the sum of all the occurences Divide each occurrenc... More on codereview.stackexchange.com
๐ŸŒ codereview.stackexchange.com
June 4, 2019
ENH: Easily calculate percentages of your df or of groups (i.e. normalize)
Calculating percentages (or normalizing as it is called in pandas) should be made easier. When aggregating and analyzing data it is very common that you want to see the percentages instead of count... More on github.com
๐ŸŒ github.com
10
February 9, 2022
Pandas group by column find percentage of count in each group
Surely you want to sum the total, not the survival indicator? train_df.groupby('Sex')['Total'].transform('sum') Then it's straightforward scalar arithmetic. More on reddit.com
๐ŸŒ r/learnpython
1
1
December 7, 2022
๐ŸŒ
Spark By {Examples}
sparkbyexamples.com โ€บ home โ€บ pandas โ€บ pandas percentage total with groupby
Pandas Percentage Total With Groupby - Spark By {Examples}
December 2, 2024 - To calculate the percentage of a columnโ€™s total for each group in a Pandas DataFrame, you can use the groupby function in combination with transform to compute the percentage of the total within each group.
๐ŸŒ
Statology
statology.org โ€บ home โ€บ pandas: how to calculate percentage of total within group
Pandas: How to Calculate Percentage of Total Within Group
June 11, 2022 - You can use the following syntax to calculate the percentage of a total within groups in pandas: df['values_var'] / df.groupby('group_var')['values_var'].transform('sum')
๐ŸŒ
Reddit
reddit.com โ€บ r/learnpython โ€บ pandas agg percent groupby of groupby
r/learnpython on Reddit: Pandas agg percent groupby of Groupby
June 10, 2021 -

Hello,

Using pandas, I try to calculate percent by row foreach subgroup of Col1.

Below an example I want to get :

COL1 COL2 AGG1
A Test1 30%
Test 2 70%
B Test 5 10%
Test 7 90%

For now, I can get a group by with count() foreach row / subrow and subtotal with sidetable or percent but for all the dataframe not 100% for each groupby.

COL1 COL2 AGG1
A Test1 13.5
Test 2 31.5
Subtotal 45

Is it possible to do it in pure Pandas or I need to parse / transform dataframe in Python

Any tips to help me to reach that goal ?

Thanks !

๐ŸŒ
Quora
quora.com โ€บ How-do-you-handle-a-Groupby-with-value-counts-and-calculate-percentage-in-Pandas-Python-3-pandas-dataframe-development
How to handle a Groupby with value counts and calculate percentage in Pandas (Python 3, pandas, dataframe, development) - Quora
1) Quick: groupby + value_counts + normalize (pandas โ‰ฅ1.1) Returns counts or percentages per group for a categorical column. ... Output columns: group_col, cat_col, pct (percentage within group_col).
Find elsewhere
Top answer
1 of 2
6

I think your code is already nearly optimal and Pythonic. But there is some little things to improve:

  • cluster_count.sum() returns you a Series object so if you are working with it outside the Pandas, it is better to specify the column: cluster_count.char.sum(). This way you will get an ordinary Python integer.
  • Pandas has an ability to manipulate with columns directly so instead of apply function usage you can just write arithmetical operations with column itself: cluster_count.char = cluster_count.char * 100 / cluster_sum (note that this line of code is in-place work).

Here is the final code:

df = pd.DataFrame({'char':['a','b','c','d','e'], 'cluster':[1,1,2,2,2]})
cluster_count=df.groupby('cluster').count()
cluster_sum=sum(cluster_count.char)
cluster_count.char = cluster_count.char * 100 / cluster_sum

Edit 1: You can do the magic even without cluster_sum variable, just in one line of code:

cluster_count.char = cluster_count.char * 100 / cluster_count.char.sum()

But I am not sure about its perfomance (it can probably recalculate the sum for each group).

2 of 2
3

Just to add in my 2 cents here:

You can approach this with series.value_counts() which has a normalize parameter.

From the docs:

normalize : boolean, default False If True then the object returned will contain the relative frequencies of the unique values.

Using this we can do:

s=df.cluster.value_counts(normalize=True,sort=False).mul(100) # mul(100) is == *100
s.index.name,s.name='cluster','percentage_' #setting the name of index and series
print(s.to_frame()) #series.to_frame() returns a dataframe

          percentage_
cluster             
1               40.0
2               60.0
๐ŸŒ
GitHub
github.com โ€บ pandas-dev โ€บ pandas โ€บ issues โ€บ 45898
ENH: Easily calculate percentages of your df or of groups (i.e. normalize) ยท Issue #45898 ยท pandas-dev/pandas
February 9, 2022 - Just a simple, unified way of getting percentages/proportions. Maybe the normalizing options should be something like: all, index, columns, group_all, group_index, group_columns
Author ย  SandervandenOord
๐ŸŒ
Skytowner
skytowner.com โ€บ explore โ€บ calculating_the_percentage_of_each_value_in_each_group_in_pandas
Calculating the percentage of each value in each group in Pandas
To compute the percentage of each value in each distinct group in Pandas, call the groupby(~) method and then pass in the following function lambda my_df: my_df / my_df.sum(). .
๐ŸŒ
Reddit
reddit.com โ€บ r/learnpython โ€บ pandas group by column find percentage of count in each group
r/learnpython on Reddit: Pandas group by column find percentage of count in each group
December 7, 2022 -

How do I calculate the percentage of each sub group?

SexSurvivedTotal
Female1233
081
Male0468
1109

I want to get the percentage of each sub group, like below:

SexSurvivedTotalPercentage
Female123374.20%
08125.80%
Male046881.11%
110918.89%

I tried the following but it didnt work:

train_df.groupby('Sex')['Survived'].transform('sum')

๐ŸŒ
Medium
dongr0510.medium.com โ€บ how-to-use-pandas-to-get-the-percentage-value-of-a-column-within-a-group-a9bd801d63de
How to use pandas to get the percentage value of a column within a group? | by Jack Dong | Medium
June 15, 2024 - # Calculate the sales percentage for each product relative to the store's total sales df['Percentage of Store Sales'] = (df['Sales'] / df['Total Sales per Store']) * 100 ... import pandas as pd # Create a sample DataFrame data = { 'Store': ['Store ...
๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ how-to-calculate-the-percentage-of-a-column-in-pandas
How to calculate the Percentage of a column in Pandas ? - GeeksforGeeks
September 29, 2023 - In this article, how to calculate quantiles by group in Pandas using Python. There are many methods to calculate the quantile, but pandas provide groupby.quantile() function to find it in a simple few lines of code. This is the Method to use when the desired quantile falls between two points.
๐ŸŒ
w3resource
w3resource.com โ€บ python-exercises โ€บ pandas โ€บ groupby โ€บ python-pandas-groupby-exercise-24.php
Pandas: Relative frequency within each group - w3resource
September 8, 2025 - Pandas Grouping and Aggregating Exercises, Practice and Solution: Write a Pandas program to split the following datasets into groups on customer_id to summarize purch_amt and calculate percentage of purch_amt in each group.
๐ŸŒ
Analyticsvidhya
discuss.analyticsvidhya.com โ€บ techniques
How to find percentage of total with groupby pandas - techniques - Data Science, Analytics and Big Data discussions
May 3, 2018 - I have a csv data set with the columns like Sales,Last_region i want to calculate the percentage of sales for each region, i was able to find the sum of sales with in each region but i am not able to find the percentage with in group by statement. Groupby statement used tempsalesregion = customerdata.groupby(["Last_region"]) tempsalesregion = tempsalesregion[["Customer_Value"]].sum().add_prefix("Sum_of_").reset_index() tempsalesregion Output is But what i need is the percentage of sales p...
๐ŸŒ
Medium
theitken.medium.com โ€บ pandas-tricks-calculate-percentage-within-group-d7be0bbaab6
Pandas Tricks - Calculate Percentage Within Group | by Ken | Medium
July 25, 2020 - Often you need to calculate % vs total on your summarized data. This article shows you some tricks to calculate percentage within groups
๐ŸŒ
Iifx
iifx.dev โ€บ english โ€บ python
python - Pandas Tutorial: Mastering Groupby and Percentage Calculations - group by
To calculate the percentage of each value within a group, you divide the individual value by the total value for that group and multiply by 100. ... import pandas as pd # Sample DataFrame data = {'Product Category': ['Electronics', 'Electronics', ...