Pandas >= 0.16
pd.Series.mode is available!
Use groupby, GroupBy.agg, and apply the pd.Series.mode function to each group:
source.groupby(['Country','City'])['Short name'].agg(pd.Series.mode)
Country City
Russia Sankt-Petersburg Spb
USA New-York NY
Name: Short name, dtype: object
If this is needed as a DataFrame, use
source.groupby(['Country','City'])['Short name'].agg(pd.Series.mode).to_frame()
Short name
Country City
Russia Sankt-Petersburg Spb
USA New-York NY
The useful thing about Series.mode is that it always returns a Series, making it very compatible with agg and apply, especially when reconstructing the groupby output. It is also faster.
# Accepted answer.
%timeit source.groupby(['Country','City']).agg(lambda x:x.value_counts().index[0])
# Proposed in this post.
%timeit source.groupby(['Country','City'])['Short name'].agg(pd.Series.mode)
5.56 ms ± 343 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.76 ms ± 387 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Dealing with Multiple Modes
Series.mode also does a good job when there are multiple modes:
source2 = source.append(
pd.Series({'Country': 'USA', 'City': 'New-York', 'Short name': 'New'}),
ignore_index=True)
# Now `source2` has two modes for the
# ("USA", "New-York") group, they are "NY" and "New".
source2
Country City Short name
0 USA New-York NY
1 USA New-York New
2 Russia Sankt-Petersburg Spb
3 USA New-York NY
4 USA New-York New
source2.groupby(['Country','City'])['Short name'].agg(pd.Series.mode)
Country City
Russia Sankt-Petersburg Spb
USA New-York [NY, New]
Name: Short name, dtype: object
Or, if you want a separate row for each mode, you can use GroupBy.apply:
source2.groupby(['Country','City'])['Short name'].apply(pd.Series.mode)
Country City
Russia Sankt-Petersburg 0 Spb
USA New-York 0 NY
1 New
Name: Short name, dtype: object
If you don't care which mode is returned as long as it's either one of them, then you will need a lambda that calls mode and extracts the first result.
source2.groupby(['Country','City'])['Short name'].agg(
lambda x: pd.Series.mode(x)[0])
Country City
Russia Sankt-Petersburg Spb
USA New-York NY
Name: Short name, dtype: object
Alternatives to (not) consider
You can also use statistics.mode from python, but...
source.groupby(['Country','City'])['Short name'].apply(statistics.mode)
Country City
Russia Sankt-Petersburg Spb
USA New-York NY
Name: Short name, dtype: object
...it does not work well when having to deal with multiple modes; a StatisticsError is raised. This is mentioned in the docs:
If data is empty, or if there is not exactly one most common value, StatisticsError is raised.
But you can see for yourself...
statistics.mode([1, 2])
# ---------------------------------------------------------------------------
# StatisticsError Traceback (most recent call last)
# ...
# StatisticsError: no unique mode; found 2 equally common values
Answer from coldspeed95 on Stack OverflowPandas >= 0.16
pd.Series.mode is available!
Use groupby, GroupBy.agg, and apply the pd.Series.mode function to each group:
source.groupby(['Country','City'])['Short name'].agg(pd.Series.mode)
Country City
Russia Sankt-Petersburg Spb
USA New-York NY
Name: Short name, dtype: object
If this is needed as a DataFrame, use
source.groupby(['Country','City'])['Short name'].agg(pd.Series.mode).to_frame()
Short name
Country City
Russia Sankt-Petersburg Spb
USA New-York NY
The useful thing about Series.mode is that it always returns a Series, making it very compatible with agg and apply, especially when reconstructing the groupby output. It is also faster.
# Accepted answer.
%timeit source.groupby(['Country','City']).agg(lambda x:x.value_counts().index[0])
# Proposed in this post.
%timeit source.groupby(['Country','City'])['Short name'].agg(pd.Series.mode)
5.56 ms ± 343 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.76 ms ± 387 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Dealing with Multiple Modes
Series.mode also does a good job when there are multiple modes:
source2 = source.append(
pd.Series({'Country': 'USA', 'City': 'New-York', 'Short name': 'New'}),
ignore_index=True)
# Now `source2` has two modes for the
# ("USA", "New-York") group, they are "NY" and "New".
source2
Country City Short name
0 USA New-York NY
1 USA New-York New
2 Russia Sankt-Petersburg Spb
3 USA New-York NY
4 USA New-York New
source2.groupby(['Country','City'])['Short name'].agg(pd.Series.mode)
Country City
Russia Sankt-Petersburg Spb
USA New-York [NY, New]
Name: Short name, dtype: object
Or, if you want a separate row for each mode, you can use GroupBy.apply:
source2.groupby(['Country','City'])['Short name'].apply(pd.Series.mode)
Country City
Russia Sankt-Petersburg 0 Spb
USA New-York 0 NY
1 New
Name: Short name, dtype: object
If you don't care which mode is returned as long as it's either one of them, then you will need a lambda that calls mode and extracts the first result.
source2.groupby(['Country','City'])['Short name'].agg(
lambda x: pd.Series.mode(x)[0])
Country City
Russia Sankt-Petersburg Spb
USA New-York NY
Name: Short name, dtype: object
Alternatives to (not) consider
You can also use statistics.mode from python, but...
source.groupby(['Country','City'])['Short name'].apply(statistics.mode)
Country City
Russia Sankt-Petersburg Spb
USA New-York NY
Name: Short name, dtype: object
...it does not work well when having to deal with multiple modes; a StatisticsError is raised. This is mentioned in the docs:
If data is empty, or if there is not exactly one most common value, StatisticsError is raised.
But you can see for yourself...
statistics.mode([1, 2])
# ---------------------------------------------------------------------------
# StatisticsError Traceback (most recent call last)
# ...
# StatisticsError: no unique mode; found 2 equally common values
You can use value_counts() to get a count series, and get the first row:
source.groupby(['Country','City']).agg(lambda x: x.value_counts().index[0])
In case you are wondering about performing other agg functions in the .agg(),
try this.
# Let's add a new col, "account"
source['account'] = [1, 2, 3, 3]
source.groupby(['Country','City']).agg(
mod=('Short name', lambda x: x.value_counts().index[0]),
avg=('account', 'mean'))
Videos
This question is really a duplicate of this Stack Overflow post. The most intuitive solution is to use pd.Series.mode in groupby but the fastest solution is to drop duplicates of a value_counts result.
counts = df.value_counts(['season', 'league_name', 'home_winner'])
result = counts[~counts.droplevel('home_winner').index.duplicated()]
A toy example / output:
df = pd.DataFrame({
'season': [1, 1, 1, 1, 1, 1],
'league_name': [*'AAABBB'],
'home_winner':[*'abaded']
})
counts = df.value_counts(['season', 'league_name', 'home_winner'])
result = counts[~counts.droplevel('home_winner').index.duplicated()]
print(result)
season league_name home_winner
1 A a 2
B d 2
def get_max(g):
g.count().sort_values(by='count', ascending =False).loc[0,:]
Groupby().apply(get_max)