Brave Search

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

stackoverflow.com › questions › 19384532 › get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby

Quick Answer:

The simplest way to get row counts per group is by calling .size(), which returns a Series:

df.groupby(['col1','col2']).size()

Usually you want this result as a DataFrame (instead of a Series) so you can do:

df.groupby(['col1', 'col2']).size().reset_index(name='counts')

If you want to find out how to calculate the row counts and other statistics for each group, continue reading below.

Detailed example:

Consider the following example dataframe:

In [2]: df
Out[2]: 
  col1 col2  col3  col4  col5  col6
0    A    B  0.20 -0.61 -0.49  1.49
1    A    B -1.53 -1.01 -0.39  1.82
2    A    B -0.44  0.27  0.72  0.11
3    A    B  0.28 -1.32  0.38  0.18
4    C    D  0.12  0.59  0.81  0.66
5    C    D -0.13 -1.65 -1.64  0.50
6    C    D -1.42 -0.11 -0.18 -0.44
7    E    F -0.00  1.42 -0.26  1.17
8    E    F  0.91 -0.47  1.35 -0.34
9    G    H  1.48 -0.63 -1.14  0.17

First let's use .size() to get the row counts:

In [3]: df.groupby(['col1', 'col2']).size()
Out[3]: 
col1  col2
A     B       4
C     D       3
E     F       2
G     H       1
dtype: int64

Then let's use .size().reset_index(name='counts') to get the row counts:

In [4]: df.groupby(['col1', 'col2']).size().reset_index(name='counts')
Out[4]: 
  col1 col2  counts
0    A    B       4
1    C    D       3
2    E    F       2
3    G    H       1

Including results for more statistics

When you want to calculate statistics on grouped data, it usually looks like this:

In [5]: (df
   ...:  .groupby(['col1', 'col2'])
   ...:  .agg({
   ...:        'col3': ['mean', 'count'], 
   ...:        'col4': ['median', 'min', 'count']
   ...:  })
   ...: )
Out[5]: 
            col4                  col3      
          median   min count      mean count
col1 col2                                   
A    B    -0.810 -1.32     4 -0.372500     4
C    D    -0.110 -1.65     3 -0.476667     3
E    F     0.475 -0.47     2  0.455000     2
G    H    -0.630 -0.63     1  1.480000     1

The result above is a little annoying to deal with because of the nested column labels, and also because row counts are on a per column basis.

To gain more control over the output I usually split the statistics into individual aggregations that I then combine using join. It looks like this:

In [6]: gb = df.groupby(['col1', 'col2'])
   ...: counts = gb.size().to_frame(name='counts')
   ...: (counts
   ...:  .join(gb.agg({'col3': 'mean'}).rename(columns={'col3': 'col3_mean'}))
   ...:  .join(gb.agg({'col4': 'median'}).rename(columns={'col4': 'col4_median'}))
   ...:  .join(gb.agg({'col4': 'min'}).rename(columns={'col4': 'col4_min'}))
   ...:  .reset_index()
   ...: )
   ...: 
Out[6]: 
  col1 col2  counts  col3_mean  col4_median  col4_min
0    A    B       4  -0.372500       -0.810     -1.32
1    C    D       3  -0.476667       -0.110     -1.65
2    E    F       2   0.455000        0.475     -0.47
3    G    H       1   1.480000       -0.630     -0.63

Footnotes

The code used to generate the test data is shown below:

In [1]: import numpy as np
   ...: import pandas as pd 
   ...: 
   ...: keys = np.array([
   ...:     ['A', 'B'],
   ...:     ['A', 'B'],
   ...:     ['A', 'B'],
   ...:     ['A', 'B'],
   ...:     ['C', 'D'],
   ...:     ['C', 'D'],
   ...:     ['C', 'D'],
   ...:     ['E', 'F'],
   ...:     ['E', 'F'],
   ...:     ['G', 'H'] 
   ...: ])
   ...: 
   ...: df = pd.DataFrame(
   ...:     np.hstack([keys, np.random.randn(10, 4).round(2)]), 
   ...:     columns=['col1', 'col2', 'col3', 'col4', 'col5', 'col6']
   ...: )
   ...: 
   ...: df[['col3', 'col4', 'col5', 'col6']] = \
   ...:     df[['col3', 'col4', 'col5', 'col6']].astype(float)
   ...:

Disclaimer:

If some of the columns that you are aggregating have null values, then you really want to be looking at the group row counts as an independent aggregation for each column. Otherwise you may be misled as to how many records are actually being used to calculate things like the mean because pandas will drop NaN entries in the mean calculation without telling you about it.

Answer from Pedro M Duarte on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 19384532 › get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby

python - Get statistics for each group (such as count, mean, etc) using pandas GroupBy? - Stack Overflow

Quick Answer:

The simplest way to get row counts per group is by calling .size(), which returns a Series:

df.groupby(['col1','col2']).size()

Usually you want this result as a DataFrame (instead of a Series) so you can do:

df.groupby(['col1', 'col2']).size().reset_index(name='counts')

If you want to find out how to calculate the row counts and other statistics for each group, continue reading below.

Detailed example:

Consider the following example dataframe:

In [2]: df
Out[2]: 
  col1 col2  col3  col4  col5  col6
0    A    B  0.20 -0.61 -0.49  1.49
1    A    B -1.53 -1.01 -0.39  1.82
2    A    B -0.44  0.27  0.72  0.11
3    A    B  0.28 -1.32  0.38  0.18
4    C    D  0.12  0.59  0.81  0.66
5    C    D -0.13 -1.65 -1.64  0.50
6    C    D -1.42 -0.11 -0.18 -0.44
7    E    F -0.00  1.42 -0.26  1.17
8    E    F  0.91 -0.47  1.35 -0.34
9    G    H  1.48 -0.63 -1.14  0.17

First let's use .size() to get the row counts:

In [3]: df.groupby(['col1', 'col2']).size()
Out[3]: 
col1  col2
A     B       4
C     D       3
E     F       2
G     H       1
dtype: int64

Then let's use .size().reset_index(name='counts') to get the row counts:

In [4]: df.groupby(['col1', 'col2']).size().reset_index(name='counts')
Out[4]: 
  col1 col2  counts
0    A    B       4
1    C    D       3
2    E    F       2
3    G    H       1

Including results for more statistics

When you want to calculate statistics on grouped data, it usually looks like this:

In [5]: (df
   ...:  .groupby(['col1', 'col2'])
   ...:  .agg({
   ...:        'col3': ['mean', 'count'], 
   ...:        'col4': ['median', 'min', 'count']
   ...:  })
   ...: )
Out[5]: 
            col4                  col3      
          median   min count      mean count
col1 col2                                   
A    B    -0.810 -1.32     4 -0.372500     4
C    D    -0.110 -1.65     3 -0.476667     3
E    F     0.475 -0.47     2  0.455000     2
G    H    -0.630 -0.63     1  1.480000     1

The result above is a little annoying to deal with because of the nested column labels, and also because row counts are on a per column basis.

To gain more control over the output I usually split the statistics into individual aggregations that I then combine using join. It looks like this:

In [6]: gb = df.groupby(['col1', 'col2'])
   ...: counts = gb.size().to_frame(name='counts')
   ...: (counts
   ...:  .join(gb.agg({'col3': 'mean'}).rename(columns={'col3': 'col3_mean'}))
   ...:  .join(gb.agg({'col4': 'median'}).rename(columns={'col4': 'col4_median'}))
   ...:  .join(gb.agg({'col4': 'min'}).rename(columns={'col4': 'col4_min'}))
   ...:  .reset_index()
   ...: )
   ...: 
Out[6]: 
  col1 col2  counts  col3_mean  col4_median  col4_min
0    A    B       4  -0.372500       -0.810     -1.32
1    C    D       3  -0.476667       -0.110     -1.65
2    E    F       2   0.455000        0.475     -0.47
3    G    H       1   1.480000       -0.630     -0.63

Footnotes

The code used to generate the test data is shown below:

In [1]: import numpy as np
   ...: import pandas as pd 
   ...: 
   ...: keys = np.array([
   ...:     ['A', 'B'],
   ...:     ['A', 'B'],
   ...:     ['A', 'B'],
   ...:     ['A', 'B'],
   ...:     ['C', 'D'],
   ...:     ['C', 'D'],
   ...:     ['C', 'D'],
   ...:     ['E', 'F'],
   ...:     ['E', 'F'],
   ...:     ['G', 'H'] 
   ...: ])
   ...: 
   ...: df = pd.DataFrame(
   ...:     np.hstack([keys, np.random.randn(10, 4).round(2)]), 
   ...:     columns=['col1', 'col2', 'col3', 'col4', 'col5', 'col6']
   ...: )
   ...: 
   ...: df[['col3', 'col4', 'col5', 'col6']] = \
   ...:     df[['col3', 'col4', 'col5', 'col6']].astype(float)
   ...:

Disclaimer:

2 of 13

636

On groupby object, the agg function can take a list to apply several aggregation methods at once. This should give you the result you need:

df[['col1', 'col2', 'col3', 'col4']].groupby(['col1', 'col2']).agg(['mean', 'count'])

Spark By {Examples}

sparkbyexamples.com › home › pandas › pandas groupby() and count() with examples

Pandas groupby() and count() with Examples

July 3, 2025 - Pandas groupby().count() is used to group columns and count the number of occurrences of each unique value in a specific column or combination of columns.

Discussions

Pandas groupby count to new column

Code: PrimaryHazing = birdcsv.groupby(['Section'])['Primary Dispersal Method'].value_counts().reset_index(name="count") More on reddit.com

r/learnpython

April 29, 2021

How to do a value count in groupby with pandas?

Hard to visualize without posting some code and example dataset but can't you use nlargest? https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.nlargest.html More on reddit.com

r/learnpython

October 28, 2018

Iterating through pandas df & groupby

Try this:

df_result = (
    df_headlines
    .groupby(['date', 'score'])
    .size()  # Get the number of rows for each (date, score) combination
    .reset_index()  # Convert back to a dataframe with 3 columns: ['date', 'score', 0]
    .rename(columns={0: 'row_count'})  # The "size" column gets named 0 by default, rename it
    .sort_values('row_count')  # We are looking for max row count, so sort by it
    .groupby(['date'], as_index=False)
    .last()  # Since we sorted by row count, we want the row with maximum for each, which will be last
    .drop('row_count', axis=1)  # Drop our intermediate row_count columns
)

You can remove the comments obviously, feel free to ask any questions.

EDIT: In general, you want to avoid iterating over DataFrames whenever possible. Built-in (vectorized) functions are always much faster.

Videos

04:25

YouTube

Pandas GroupBy Count - YouTube

Count Rows by Group in pandas DataFrame in Python (Examples) | ...

July 15, 2022

01:50

YouTube

How to Count Size of Groupby Groups in Pandas Efficiently - YouTube

August 26, 2025

10:27

YouTube

Mastering Pandas .groupby() in 10 Minutes! | Step-by-Step Python ...

July 29, 2025

09:02

YouTube

How To Count Data In A DataFrame - Pandas For Machine Learning ...

September 26, 2022

View all

NBShare

nbshare.io › notebook › 352981863 › Pandas-Groupby-Count-of-Rows-In-Each-Group

Pandas Groupby Count of Rows In Each Group

# Group the dataframe by the 'Director' ... Martin Scorsese 2 Quentin Tarantino 1 Steven Spielberg 3 dtype: int64 · You can also use the count() method to get the count of rows for each group....

TechBeamers

techbeamers.com › pandas-groupby-count

Python: Pandas GroupBy() and Count() Methods - TechBeamers

November 30, 2025 - To address this, we’ll utilize the Pandas GroupBy operation to group the values based on the ‘Type’ column. Subsequently, we’ll employ the count function to calculate the number of occurrences within each category.

Pandas

pandas.pydata.org › pandas-docs › version › 2.1.1 › reference › api › pandas.core.groupby.SeriesGroupBy.count.html

pandas.core.groupby.SeriesGroupBy.count — pandas 2.1.1 documentation

Count of values within each group. ... Apply a function groupby to a Series.

IncludeHelp

includehelp.com › python › pandas-groupby-count-and-mean-combined.aspx

Python - Pandas Groupby: Count and mean combined

September 17, 2023 - # Importing pandas package import pandas as pd # Creating a dictionary d = { 'X': ['one','two','two','one','one'], 'Y': [10,10,20,30,30] } # Creating a DataFrame df = pd.DataFrame(d) # Display original DataFrame print("Original DataFrame:\n",df,"\n") # Using groupby and aggregate df = df.groupby('X').agg({'X':'size', 'Y':'mean'}).rename(columns={'X':'count','Y':'mean_sent'}) # Display modified DataFrame print("Modified DataFrame:\n",df)

Find elsewhere

Google Bing Mojeek

GeeksforGeeks

geeksforgeeks.org › python › pandas-groupby-count-the-occurrences-of-each-combination

Pandas GroupBy - Count the occurrences of each combination - GeeksforGeeks

July 23, 2025 - It will generate the number of similar data counts present in a particular column of the data frame. We will start by creating a simple DataFrame for demonstration purposes: ... import pandas as pd import numpy as np # Initialize data with sample ...

FavTutor

favtutor.com › blogs › pandas-groupby-count

Pandas Groupby Count Using Size() and Count() Method

February 10, 2022 - Instead of the size() method, you can also use the pandas groupby count() method to count the values of each column in each group. Note that the number of counts is always similar to the row sizes if there is no presence of NaN value in the ...

Data Science Parichay

datascienceparichay.com › home › blog › pandas groupby – count of rows in each group

Pandas Groupby - Count of rows in each group - Data Science Parichay

September 7, 2022 - You can use the pandas groupby size() function to count the number of rows in each group of a groupby object.

Pandas

pandas.pydata.org › docs › user_guide › groupby.html

Group by: split-apply-combine — pandas 3.0.1 documentation

The name GroupBy should be quite familiar to those who have used a SQL-based tool (or itertools), in which you can write code like: SELECT Column1, Column2, mean(Column3), sum(Column4) FROM SomeTable GROUP BY Column1, Column2 · We aim to make operations like this natural and easy to express using pandas...

reddit.com › r/learnpython › pandas groupby count to new column

r/learnpython on Reddit: Pandas groupby count to new column

April 29, 2021 -

Hello, I am wondering what to use to make the count of a pandas dataframe go into new columns.

In my example here there is "primary dispersal". This consists of Pyro, Truck, LRAD.

What I want to do is figure out how many times each was used, in a given area. In the IMGUR snip I have the results correct just now how I want to present it.

I would like it to be the section name, then each type (pyro, truck etc) as its own column with the count in the attribute cell.

I have included the code line I wrote, the desired output and the current result.

Basically I am doing a few different queries which I will then merge all together to create a database on some results.

I dont know how toput the code in here with proper formatting so I hope IMGUR works as its just a line of code.

https://imgur.com/a/vqYKQYv

Top answer

1 of 2

Code: PrimaryHazing = birdcsv.groupby(['Section'])['Primary Dispersal Method'].value_counts().reset_index(name="count")

2 of 2

Do you mean a pivot table? PrimaryHazing.pivot(index='Section', columns='Primary Dispersal Method', values='count')

Pandas

pandas.pydata.org › pandas-docs › version › 0.22 › generated › pandas.core.groupby.GroupBy.count.html

GroupBy. count

pandas.core.groupby.GroupBy.count · pandas.core.groupby.GroupBy.cumcount · pandas.core.groupby.GroupBy.first · pandas.core.groupby.GroupBy.head · pandas.core.groupby.GroupBy.last · pandas.core.groupby.GroupBy.max · pandas.core.groupby.GroupBy.mean · pandas.core.groupby.GroupBy.median ·

Pandas

pandas.pydata.org › pandas-docs › version › 2.2 › reference › api › pandas.core.groupby.DataFrameGroupBy.count.html

pandas.core.groupby.DataFrameGroupBy.count — pandas 2.2.3 documentation

>>> data = [[1, np.nan, 3], [1, np.nan, 6], [7, 8, 9]] >>> df = pd.DataFrame(data, columns=["a", "b", "c"], ... index=["cow", "horse", "bull"]) >>> df a b c cow 1 NaN 3 horse 1 NaN 6 bull 7 8.0 9 >>> df.groupby("a").count() b c a 1 0 2 7 1 1

Pandas

pandas.pydata.org › docs › reference › groupby.html

GroupBy — pandas 3.0.1 documentation

pandas.api.typing.DataFrameGroupBy and pandas.api.typing.SeriesGroupBy instances are returned by groupby calls pandas.DataFrame.groupby() and pandas.Series.groupby() respectively · DataFrameGroupBy.__iter__()

TutorialsPoint

tutorialspoint.com › article › how-to-count-unique-values-in-a-pandas-groupby-object

How to Count Unique Values in a Pandas Groupby Object?

August 3, 2023 - By counting the number of unique values in a Groupby object, we can gain insights into the diversity and distribution of the data within each group. To count unique values in a pandas Groupby object, we need to use the nunique() method. This method ...

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.groupby.html

pandas.DataFrame.groupby — pandas 3.0.1 documentation

If a list or ndarray of length equal to the number of rows is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self.

Theprogrammingassignmenthelp

theprogrammingassignmenthelp.com › pandas-groupby-count-in-python

Pandas Groupby Count in Python | Python Tutors

This journey focuses on group by-count, a powerful duo in pandas. groupby organizes your data, and count() tells you how many times things appear within each group.

Dataquest

dataquest.io › blog › grouping-data-a-step-by-step-tutorial-to-groupby-in-pandas

Grouping Data: A Step-by-Step Tutorial to Using GroupBy in Pandas (2022) – Dataquest

March 6, 2023 - Any groupby process involves some combination of the following 3 steps: Splitting the original object into groups based on the defined criteria. Applying a function to each group. Combining the results. Let's explore this split-apply-combine chain step-by-step with an example from a Kaggle Nobel Prize Dataset: import pandas as pd import numpy as np pd.set_option('max_columns', None) df = pd.read_csv('complete.csv') df = df[['awardYear', 'category', 'prizeAmount', 'prizeAmountAdjusted', 'name', 'gender', 'birth_continent']] df.head()

GeeksforGeeks

geeksforgeeks.org › python › pandas-groupby-value-counts-on-the-dataframe

Pandas - Groupby value counts on the DataFrame - GeeksforGeeks

July 23, 2025 - To count Groupby values in the pandas dataframe we are going to use groupby() size() and unstack() method.

Pandas

pandas.pydata.org › pandas-docs › version › 1.5 › reference › api › pandas.core.groupby.DataFrameGroupBy.count.html

pandas.core.groupby.DataFrameGroupBy.count — pandas 1.5.2 documentation

Count of values within each group. ... Apply a function groupby to a Series.