stackoverflow.com › questions › 23377108 › pandas-percentage-of-total-with-groupby

Update 2022-03

This answer by caner using transform looks much better than my original answer!

df['sales'] / df.groupby('state')['sales'].transform('sum')

Thanks to this comment by Paul Rougieux for surfacing it.

Original Answer (2014)

Paul H's answer is right that you will have to make a second groupby object, but you can calculate the percentage in a simpler way -- just groupby the state_office and divide the sales column by its sum. Copying the beginning of Paul H's answer:

# From Paul H
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                   'office_id': list(range(1, 7)) * 2,
                   'sales': [np.random.randint(100000, 999999)
                             for _ in range(12)]})
state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
# Change: groupby state_office and divide by sum
state_pcts = state_office.groupby(level=0).apply(lambda x:
                                                 100 * x / float(x.sum()))

Returns:

                     sales
state office_id           
AZ    2          16.981365
      4          19.250033
      6          63.768601
CA    1          19.331879
      3          33.858747
      5          46.809373
CO    1          36.851857
      3          19.874290
      5          43.273852
WA    2          34.707233
      4          35.511259
      6          29.781508

Answer from exp1orer on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 23377108 › pandas-percentage-of-total-with-groupby

python - Pandas percentage of total with groupby - Stack Overflow

Top answer

1 of 16

396

Update 2022-03

This answer by caner using transform looks much better than my original answer!

df['sales'] / df.groupby('state')['sales'].transform('sum')

Thanks to this comment by Paul Rougieux for surfacing it.

Original Answer (2014)

# From Paul H
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                   'office_id': list(range(1, 7)) * 2,
                   'sales': [np.random.randint(100000, 999999)
                             for _ in range(12)]})
state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
# Change: groupby state_office and divide by sum
state_pcts = state_office.groupby(level=0).apply(lambda x:
                                                 100 * x / float(x.sum()))

Returns:

                     sales
state office_id           
AZ    2          16.981365
      4          19.250033
      6          63.768601
CA    1          19.331879
      3          33.858747
      5          46.809373
CO    1          36.851857
      3          19.874290
      5          43.273852
WA    2          34.707233
      4          35.511259
      6          29.781508

2 of 16

102

(This solution is inspired from this article: Understanding the Transform Function in Pandas)

I find the following solution to be the simplest(and probably the fastest) using transformation:

Transformation: While aggregation must return a reduced version of the data, transformation can return some transformed version of the full data to recombine. For such a transformation, the output is the same shape as the input.

So using transformation, the solution is 1-liner:

df['%'] = 100 * df['sales'] / df.groupby('state')['sales'].transform('sum')

And if you print:

print(df.sort_values(['state', 'office_id']).reset_index(drop=True))

   state  office_id   sales          %
0     AZ          2  195197   9.844309
1     AZ          4  877890  44.274352
2     AZ          6  909754  45.881339
3     CA          1  614752  50.415708
4     CA          3  395340  32.421767
5     CA          5  209274  17.162525
6     CO          1  549430  42.659629
7     CO          3  457514  35.522956
8     CO          5  280995  21.817415
9     WA          2  828238  35.696929
10    WA          4  719366  31.004563
11    WA          6  772590  33.298509

Medium

medium.com › data-science › 4-useful-tips-of-pandas-groupby-3744eefb1852

How to calculate percentage within groupby in Pandas? | by Andryw Marques | TDS Archive | Medium

December 26, 2022 - In most of the situations, we want to split the data into groups and do something with those groups. Usually Aggregation, Transformation, Filtration. I will use this toy dataframe of Sales to illustrate the tricks: ... This is the function I use most. Many times I use Groupby to summarize some values and I want to know what is the percentage of the values in each group, not in all the data.

Discussions

Pandas agg percent groupby of Groupby

I think what you want is this: import pandas as pd data = {'Col_1': ['A', 'A', 'B', 'B'], 'Col_2': ['Test1', 'Test2', 'Test5', 'Test7'], 'Values': [13.5, 31.5, 5, 45]} df = pd.DataFrame(data) sub_totals = df.groupby(['Col_1', 'Col_2']).agg(sum) df_new = sub_totals.groupby(level='Col_1').apply(lambda x: x * 100 / x.sum()) You have to chain two groupby. With the first groupby you grab all the tests in each category and calculate their sum. With the second one you look at the individual tests per category and calculate their percentage based on the subtotal you got from the first groupby. More on reddit.com

r/learnpython

June 10, 2021

python - Groupby count, then sum and get the percentage - Code Review Stack Exchange

I wrote this code. It works, but I think there is a more elegant and Pythonic way to this task. Groupby and count the different occurences Get the sum of all the occurences Divide each occurrenc... More on codereview.stackexchange.com

codereview.stackexchange.com

June 4, 2019

ENH: Easily calculate percentages of your df or of groups (i.e. normalize)

Calculating percentages (or normalizing as it is called in pandas) should be made easier. When aggregating and analyzing data it is very common that you want to see the percentages instead of count... More on github.com

github.com

February 9, 2022

Pandas group by column find percentage of count in each group

Surely you want to sum the total, not the survival indicator? train_df.groupby('Sex')['Total'].transform('sum') Then it's straightforward scalar arithmetic. More on reddit.com

r/learnpython

December 7, 2022

Spark By {Examples}

sparkbyexamples.com › home › pandas › pandas percentage total with groupby

Pandas Percentage Total With Groupby - Spark By {Examples}

December 2, 2024 - To calculate the percentage of a column’s total for each group in a Pandas DataFrame, you can use the groupby function in combination with transform to compute the percentage of the total within each group.

Statology

statology.org › home › pandas: how to calculate percentage of total within group

Pandas: How to Calculate Percentage of Total Within Group

June 11, 2022 - You can use the following syntax to calculate the percentage of a total within groups in pandas: df['values_var'] / df.groupby('group_var')['values_var'].transform('sum')

reddit.com › r/learnpython › pandas agg percent groupby of groupby

r/learnpython on Reddit: Pandas agg percent groupby of Groupby

June 10, 2021 -

Hello,

Using pandas, I try to calculate percent by row foreach subgroup of Col1.

Below an example I want to get :

COL1 COL2 AGG1
A Test1 30%
Test 2 70%
B Test 5 10%
Test 7 90%

For now, I can get a group by with count() foreach row / subrow and subtotal with sidetable or percent but for all the dataframe not 100% for each groupby.

COL1 COL2 AGG1
A Test1 13.5
Test 2 31.5
Subtotal 45

Is it possible to do it in pure Pandas or I need to parse / transform dataframe in Python

Any tips to help me to reach that goal ?

Thanks !

Top answer

1 of 2

2 of 2

calculate percent by row What does that mean? each row / subrow What is a "subrow"? I don't quite follow what you want or what isn't working. Are you saying you want to figure out what percentage of the rows in the full df come from each group? If so, do this: >>> df.groupby('COL1').size() / len(df) * 100 The difference between size and count is that the former includes NaN rows in the tally while the latter excludes them. Which you want to use is up to you.

Pandas

pandas.pydata.org › docs › reference › api › pandas.core.groupby.DataFrameGroupBy.pct_change.html

pandas.core.groupby.DataFrameGroupBy.pct_change — pandas 2.3.3 documentation

Percentage changes within each group. ... Apply a function groupby to a Series.

Pandas

pandas.pydata.org › pandas-docs › stable › reference › api › pandas.core.groupby.DataFrameGroupBy.pct_change.html

pandas.core.groupby.DataFrameGroupBy.pct_change — pandas 2.3.0 documentation

Percentage changes within each group. ... Apply a function groupby to a Series.

Quora

quora.com › How-do-you-handle-a-Groupby-with-value-counts-and-calculate-percentage-in-Pandas-Python-3-pandas-dataframe-development

How to handle a Groupby with value counts and calculate percentage in Pandas (Python 3, pandas, dataframe, development) - Quora

1) Quick: groupby + value_counts + normalize (pandas ≥1.1) Returns counts or percentages per group for a categorical column. ... Output columns: group_col, cat_col, pct (percentage within group_col).

Find elsewhere

Google Bing Mojeek

Stack Exchange

codereview.stackexchange.com › questions › 221636 › groupby-count-then-sum-and-get-the-percentage

python - Groupby count, then sum and get the percentage - Code Review Stack Exchange

Top answer

1 of 2

I think your code is already nearly optimal and Pythonic. But there is some little things to improve:

cluster_count.sum() returns you a Series object so if you are working with it outside the Pandas, it is better to specify the column: cluster_count.char.sum(). This way you will get an ordinary Python integer.
Pandas has an ability to manipulate with columns directly so instead of apply function usage you can just write arithmetical operations with column itself: cluster_count.char = cluster_count.char * 100 / cluster_sum (note that this line of code is in-place work).

Here is the final code:

df = pd.DataFrame({'char':['a','b','c','d','e'], 'cluster':[1,1,2,2,2]})
cluster_count=df.groupby('cluster').count()
cluster_sum=sum(cluster_count.char)
cluster_count.char = cluster_count.char * 100 / cluster_sum

Edit 1: You can do the magic even without cluster_sum variable, just in one line of code:

cluster_count.char = cluster_count.char * 100 / cluster_count.char.sum()

But I am not sure about its perfomance (it can probably recalculate the sum for each group).

2 of 2

Just to add in my 2 cents here:

You can approach this with series.value_counts() which has a normalize parameter.

From the docs:

normalize : boolean, default False If True then the object returned will contain the relative frequencies of the unique values.

Using this we can do:

s=df.cluster.value_counts(normalize=True,sort=False).mul(100) # mul(100) is == *100
s.index.name,s.name='cluster','percentage_' #setting the name of index and series
print(s.to_frame()) #series.to_frame() returns a dataframe

          percentage_
cluster             
1               40.0
2               60.0

GitHub

github.com › pandas-dev › pandas › issues › 45898

ENH: Easily calculate percentages of your df or of groups (i.e. normalize) · Issue #45898 · pandas-dev/pandas

February 9, 2022 - Just a simple, unified way of getting percentages/proportions. Maybe the normalizing options should be something like: all, index, columns, group_all, group_index, group_columns

Author SandervandenOord

Skytowner

skytowner.com › explore › calculating_the_percentage_of_each_value_in_each_group_in_pandas

Calculating the percentage of each value in each group in Pandas

To compute the percentage of each value in each distinct group in Pandas, call the groupby(~) method and then pass in the following function lambda my_df: my_df / my_df.sum(). .

reddit.com › r/learnpython › pandas group by column find percentage of count in each group

r/learnpython on Reddit: Pandas group by column find percentage of count in each group

December 7, 2022 -

How do I calculate the percentage of each sub group?

Sex	Survived	Total
Female	1	233
	0	81
Male	0	468
	1	109

I want to get the percentage of each sub group, like below:

Sex	Survived	Total	Percentage
Female	1	233	74.20%
	0	81	25.80%
Male	0	468	81.11%
	1	109	18.89%

I tried the following but it didnt work:

train_df.groupby('Sex')['Survived'].transform('sum')

Top answer

1 of 1

Surely you want to sum the total, not the survival indicator? train_df.groupby('Sex')['Total'].transform('sum') Then it's straightforward scalar arithmetic.

Medium

dongr0510.medium.com › how-to-use-pandas-to-get-the-percentage-value-of-a-column-within-a-group-a9bd801d63de

How to use pandas to get the percentage value of a column within a group? | by Jack Dong | Medium

June 15, 2024 - # Calculate the sales percentage for each product relative to the store's total sales df['Percentage of Store Sales'] = (df['Sales'] / df['Total Sales per Store']) * 100 ... import pandas as pd # Create a sample DataFrame data = { 'Store': ['Store ...

GeeksforGeeks

geeksforgeeks.org › how-to-calculate-the-percentage-of-a-column-in-pandas

How to calculate the Percentage of a column in Pandas ? - GeeksforGeeks

September 29, 2023 - In this article, how to calculate quantiles by group in Pandas using Python. There are many methods to calculate the quantile, but pandas provide groupby.quantile() function to find it in a simple few lines of code. This is the Method to use when the desired quantile falls between two points.

w3resource

w3resource.com › python-exercises › pandas › groupby › python-pandas-groupby-exercise-24.php

Pandas: Relative frequency within each group - w3resource

September 8, 2025 - Pandas Grouping and Aggregating Exercises, Practice and Solution: Write a Pandas program to split the following datasets into groups on customer_id to summarize purch_amt and calculate percentage of purch_amt in each group.

Stack Overflow

stackoverflow.com › questions › 23627782 › pandas-groupby-size-and-percentages

python - Pandas: .groupby().size() and percentages - Stack Overflow

Top answer

1 of 1

Here is the complete example based on pandas groupby, sum functions. The basic idea is to group data based on 'Localization' and to apply a function on group.

import pandas as pd
from io import StringIO
#For Python 2, replace previous line with: from StringIO import StringIO

data = \
"""Localization,RNA level,Size
cytoplasm                            ,1 Non-expressed, 7
cytoplasm                            ,2 Very low     ,13
cytoplasm                            ,3 Low          , 8
cytoplasm                            ,4 Medium       , 6
cytoplasm                            ,5 Moderate     , 8
cytoplasm                            ,6 High         , 2
cytoplasm                            ,7 Very high    , 6
cytoplasm & nucleus                  ,1 Non-expressed, 5
cytoplasm & nucleus                  ,2 Very low     , 8
cytoplasm & nucleus                  ,3 Low          , 2
cytoplasm & nucleus                  ,4 Medium       ,10
cytoplasm & nucleus                  ,5 Moderate     ,16
cytoplasm & nucleus                  ,6 High         , 6
cytoplasm & nucleus                  ,7 Very high    , 5
cytoplasm & nucleus & plasma membrane,1 Non-expressed, 6
cytoplasm & nucleus & plasma membrane,2 Very low     , 3
cytoplasm & nucleus & plasma membrane,3 Low          , 3
cytoplasm & nucleus & plasma membrane,4 Medium       , 7
cytoplasm & nucleus & plasma membrane,5 Moderate     , 8
cytoplasm & nucleus & plasma membrane,6 High         , 4
cytoplasm & nucleus & plasma membrane,7 Very high    , 1"""

# Create the dataframe
df = pd.read_csv(StringIO(data))
df['Localization'].str.strip()
df['RNA level'].str.strip()
df['Size'].astype(int)
df['Percent'] = df.groupby('Localization')['Size'].transform(lambda x: x/sum(x))

Analyticsvidhya

discuss.analyticsvidhya.com › techniques

How to find percentage of total with groupby pandas - techniques - Data Science, Analytics and Big Data discussions

May 3, 2018 - I have a csv data set with the columns like Sales,Last_region i want to calculate the percentage of sales for each region, i was able to find the sum of sales with in each region but i am not able to find the percentage with in group by statement. Groupby statement used tempsalesregion = customerdata.groupby(["Last_region"]) tempsalesregion = tempsalesregion[["Customer_Value"]].sum().add_prefix("Sum_of_").reset_index() tempsalesregion Output is But what i need is the percentage of sales p...

Medium

theitken.medium.com › pandas-tricks-calculate-percentage-within-group-d7be0bbaab6

Pandas Tricks - Calculate Percentage Within Group | by Ken | Medium

July 25, 2020 - Often you need to calculate % vs total on your summarized data. This article shows you some tricks to calculate percentage within groups

Iifx

iifx.dev › english › python

python - Pandas Tutorial: Mastering Groupby and Percentage Calculations - group by

To calculate the percentage of each value within a group, you divide the individual value by the total value for that group and multiply by 100. ... import pandas as pd # Sample DataFrame data = {'Product Category': ['Electronics', 'Electronics', ...