This is more like a transform

df['help_column'] = df.groupby('ID')['percentage'].transform('max')
Answer from BENY on Stack Overflow
🌐
Pandas
pandas.pydata.org β€Ί docs β€Ί reference β€Ί api β€Ί pandas.DataFrame.max.html
pandas.DataFrame.max β€” pandas 3.0.2 documentation
Return the maximum of the values over the requested axis Β· If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax
🌐
W3Schools
w3schools.com β€Ί python β€Ί pandas β€Ί ref_df_max.asp
Pandas DataFrame max() Method
By specifying the column axis (axis='columns'), the max() method searches column-wise and returns the maximum value for each row. dataframe.max(axis, skipna, level, numeric_only, kwargs) The axis, skipna, level, numeric_only parameters are keyword ...
Discussions

Pandas: Get the max value of a group ONLY if the value satisfies given conditions
Can't you just: sort by: , Possible, Total keep the last record for each ? Sorting by keeps your groups together. Sorting by Possible puts False before True in each group, so if you have True values, keeping the last will ensure you pick a True. Then sorting by Total ensures the last value is the largest of the Trues, or if all Falses, the largest of the Falses. (Edit: typo) More on reddit.com
🌐 r/learnpython
6
1
September 20, 2022
python - Find maximum value of a column and return the corresponding row values using Pandas - Stack Overflow
Country Place Value US NewYork 562 US Michigan 854 US Illinois 356 UK London 778 UK Manchester 512 Spain Madrid 509 India Mumbai ... More on stackoverflow.com
🌐 stackoverflow.com
python - Fill column based on conditional max value in Pandas - Stack Overflow
I have a dataframe that looks like this (link to csv): id, time, value, approved 0, 0:00, 10, false 1, 0:01, 20, false 1, 0:02, 50, false 1, 0:03, 20, true 1, 0:04, 40, true 1, ... More on stackoverflow.com
🌐 stackoverflow.com
How to find max values of columns and arrange them in order based on condition using pandas? - Stack Overflow
How to find max values of columns and arrange them in order based on condition using pandas? More on stackoverflow.com
🌐 stackoverflow.com
🌐
Spark By {Examples}
sparkbyexamples.com β€Ί home β€Ί pandas β€Ί pandas find row values for column maximal
Pandas Find Row Values for Column Maximal - Spark By {Examples}
November 25, 2024 - In Pandas, you can find the row values for the maximum value in a specific column using the idxmax() function along with the column selection. You can
🌐
Reddit
reddit.com β€Ί r/learnpython β€Ί pandas: get the max value of a group only if the value satisfies given conditions
r/learnpython on Reddit: Pandas: Get the max value of a group ONLY if the value satisfies given conditions
September 20, 2022 -

I have a huge datatset.

The data is grouped by col, row, year, no, potveg, and total. I am trying to get the maximum value of the 'total' column in each specific group year ONLY if its 'possible' value is TRUE. If the max 'total' value is FALSE, then get the second max value, and so on.

If all the values of the 'possible' column in a specific year group = False, then I want to pick the max out of the False so that I don't skip any years.

i.e., for the dataset below:

col	row	year	no	potveg	total	possible
                        						
-125	42.5	2015	1	9	697.3	FALSE
-125	42.5	2015	2	13	535.2	TRUE
-125	42.5	2015	3	15	82.3	TRUE
-125	42.5	2016	1	9	907.8	TRUE
-125	42.5	2016	2	13	137.6	FALSE
-125	42.5	2016	3	15	268.4	TRUE
-125	42.5	2017	1	9	961.9	FALSE
-125	42.5	2017	2	13	74.2	TRUE
-125	42.5	2017	3	15	248	TRUE
-125	42.5	2018	1	9	937.9	TRUE
-125	42.5	2018	2	13	575.6	TRUE
-125	42.5	2018	3	15	215.5	FALSE
-135	70.5	2015	1	8	697.3	FALSE
-135	70.5	2015	2	10	535.2	TRUE
-135	70.5	2015	3	19	82.3	TRUE
-135	70.5	2016	1	8	907.8	TRUE
-135	70.5	2016	2	10	137.6	FALSE
-135	70.5	2016	3	19	268.4	TRUE
-135	70.5	2017	1	8	961.9	FALSE
-135	70.5	2017	2	10	74.2	TRUE
-135	70.5	2017	3	19	248	TRUE
-135	70.5	2018	1	8	937.9	TRUE
-135	70.5	2018	2	10	575.6	TRUE
-135	70.5	2018	3	19	215.5	FALSE
-135	70.5	2019	1	8	937.9	FALSE
-135	70.5	2019	2	10	575.6	FALSE
-135	70.5	2019	3	19	215.5	FALSE

The output would be:

col	row	year	no	potveg	total	possible
-125	42.5	2015	2	13	535.2	TRUE
-125	42.5	2016	1	9	907.8	TRUE
-125	42.5	2017	3	15	248	TRUE
-125	42.5	2018	1	9	937.9	TRUE
-135	70.5	2015	2	10	535.2	TRUE
-135	70.5	2016	1	8	907.8	TRUE
-135	70.5	2017	3	19	248	TRUE
-135	70.5	2018	1	8	937.9	TRUE
-135	70.5	2019	1	8	937.9	FALSE

I have tried

# Separate out the true and false possibilities by grouping by ['col','row','year','possible']
#  and getting the idxmax for column total. At the end, we sort the result on possible in descending order.
#  This  puts all idxmax values (now in total) with True in possible first.

idx = df.groupby(['col','row','year','possible'], as_index=False)['total']\
    .idxmax().sort_values('possible', ascending=False)['total']

#we then apply a second groupby, this time only on['col', 'row', 'year'] and simply get the first.

result = df.iloc[idx].groupby(['col', 'row', 'year']).first()
orig_index = df.set_index(['col', 'row', 'year']).index.drop_duplicates()

#re-establishing the original order by using df.reindex based on the original df 
# with index there set to ['col','row','year'] and getting rid of the duplicates first.

result_reordered = result.reindex(orig_index)

But I am still getting some years where the max value is not picked resulting in duplicates.

🌐
Saturn Cloud
saturncloud.io β€Ί blog β€Ί how-to-select-row-with-max-value-in-column-from-pandas-groupby-groups
How to Select Row with Max Value in Column from Pandas groupby Groups | Saturn Cloud Blog
October 19, 2023 - Here, we define a function get_max_score() that takes a group as input, finds the row with the maximum value in the β€˜Score’ column using idxmax(), and returns the row. We then apply this function to each group using groupby() to get the maximum scores for each group.
Find elsewhere
🌐
Data Science Discovery
discovery.cs.illinois.edu β€Ί guides β€Ί DataFrame-Row-Selection β€Ί finding-min-and-max
Finding Minimum and Maximum Values in a DataFrame Column - Data Science Discovery
August 10, 2022 - .max() and .min() functions allow us to find the smallest and largest numbers in a column. Make sure to specify the column in brackets before applying the function. Note: this only works for columns of integer or float dtypes β€” not strings. ...
🌐
TutorialsPoint
tutorialspoint.com β€Ί article β€Ί python-pandas-find-the-maximum-value-of-a-column-and-return-its-corresponding-row-values
Python Pandas – Find the maximum value of a column and return its corresponding row values
August 27, 2023 - Maximum value of column x and its corresponding row values: x 7 y 5 z 5 Name: 2, dtype: int64 Β· import pandas as pd df = pd.DataFrame( { "x": [5, 2, 7, 0], "y": [4, 7, 5, 1], "z": [9, 3, 5, 1] } ) # Check maximum for all columns for col in ["x", "y", "z"]: max_row = df.loc[df[col].idxmax()] print(f"Column '{col}' maximum (value={max_row[col]}) at index {df[col].idxmax()}:") print(max_row) print()
🌐
GeeksforGeeks
geeksforgeeks.org β€Ί python β€Ί find-maximum-values-position-in-columns-and-rows-of-a-dataframe-in-pandas
Find maximum values & position in columns and rows of a Dataframe in Pandas - GeeksforGeeks
July 15, 2025 - If the input is a Dataframe, then ... this method. To find the maximum value of each column, call the max() method on the Dataframe object without taking any argument....
🌐
Medium
medium.com β€Ί @whyamit404 β€Ί understanding-pandas-max-a6c077525d42
Understanding pandas max(). If you think you need to spend $2,000… | by whyamit404 | Medium
February 26, 2025 - You can switch between columns and rows easily using this. ... True β†’ Ignores NaN (keeps calculations clean). False β†’ Includes NaN, which can mess up your max values. ... True β†’ Considers numbers only. False or None β†’ Works with other data types like strings or dates. ... import pandas as pd # Simple DataFrame data = { 'Math': [88, 92, 79, 93], 'Science': [90, 85, 91, 89], 'English': [78, 83, 88, 85] } df = pd.DataFrame(data) # Find max value in each subject (column-wise) print(df.max())
🌐
Data Science Parichay
datascienceparichay.com β€Ί home β€Ί blog β€Ί pandas – get max value in one or more columns
Pandas - Get max value in one or more columns - Data Science Parichay
February 14, 2023 - You can use the pandas max() function to get the maximum value in a given column, multiple columns, or the entire dataframe.
Top answer
1 of 4
320

You can get the maximum like this:

>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [-2, 8, 1]})
>>> df
   A  B
0  1 -2
1  2  8
2  3  1
>>> df[["A", "B"]]
   A  B
0  1 -2
1  2  8
2  3  1
>>> df[["A", "B"]].max(axis=1)
0    1
1    8
2    3

and so:

>>> df["C"] = df[["A", "B"]].max(axis=1)
>>> df
   A  B  C
0  1 -2  1
1  2  8  8
2  3  1  3

If you know that "A" and "B" are the only columns, you could even get away with

>>> df["C"] = df.max(axis=1)

And you could use .apply(max, axis=1) too, I guess.

2 of 4
55

@DSM's answer is perfectly fine in almost any normal scenario. But if you're the type of programmer who wants to go a little deeper than the surface level, you might be interested to know that it is a little faster to call numpy functions on the underlying .to_numpy() (or .values for <0.24) array instead of directly calling the (cythonized) functions defined on the DataFrame/Series objects.

For example, you can use ndarray.max() along the first axis.

# Data borrowed from @DSM's post.
df = pd.DataFrame({"A": [1,2,3], "B": [-2, 8, 1]})
df
   A  B
0  1 -2
1  2  8
2  3  1

df['C'] = df[['A', 'B']].values.max(1)
# Or, assuming "A" and "B" are the only columns, 
# df['C'] = df.values.max(1) 
df

   A  B  C
0  1 -2  1
1  2  8  8
2  3  1  3 

If your data has NaNs, you will need numpy.nanmax:

df['C'] = np.nanmax(df.values, axis=1)
df

   A  B  C
0  1 -2  1
1  2  8  8
2  3  1  3 

You can also use numpy.maximum.reduce. numpy.maximum is a ufunc (Universal Function), and every ufunc has a reduce:

df['C'] = np.maximum.reduce(df['A', 'B']].values, axis=1)
# df['C'] = np.maximum.reduce(df[['A', 'B']], axis=1)
# df['C'] = np.maximum.reduce(df, axis=1)
df

   A  B  C
0  1 -2  1
1  2  8  8
2  3  1  3

np.maximum.reduce and np.max appear to be more or less the same (for most normal sized DataFrames)β€”and happen to be a shade faster than DataFrame.max. I imagine this difference roughly remains constant, and is due to internal overhead (indexing alignment, handling NaNs, etc).

The graph was generated using perfplot. Benchmarking code, for reference:

import pandas as pd
import perfplot

np.random.seed(0)
df_ = pd.DataFrame(np.random.randn(5, 1000))

perfplot.show(
    setup=lambda n: pd.concat([df_] * n, ignore_index=True),
    kernels=[
        lambda df: df.assign(new=df.max(axis=1)),
        lambda df: df.assign(new=df.values.max(1)),
        lambda df: df.assign(new=np.nanmax(df.values, axis=1)),
        lambda df: df.assign(new=np.maximum.reduce(df.values, axis=1)),
    ],
    labels=['df.max', 'np.max', 'np.maximum.reduce', 'np.nanmax'],
    n_range=[2**k for k in range(0, 15)],
    xlabel='N (* len(df))',
    logx=True,
    logy=True)
🌐
DataScience Made Simple
datasciencemadesimple.com β€Ί home β€Ί max() – maximum value of column in python pandas
max() - maximum value of column in python pandas - DataScience Made Simple
December 24, 2020 - How to get Column wise maximum value of all the column. ... import pandas as pd import numpy as np #Create a DataFrame d = { 'Name':['Alisa','Bobby','jodha','jack','raghu','Cathrine', 'Alisa','Bobby','kumar','Alisa','Alex','Cathrine'], 'Age':[26,24,23,22,23,24,26,24,22,23,24,24], 'Score':[85,63,55,74,31,77,85,63,42,62,89,77]} df = pd.DataFrame(d,columns=['Name','Age','Score']) df
🌐
YouTube
youtube.com β€Ί watch
Get Max Value in Pandas Column | Python Tutorial - YouTube
How do you get the maximum value in one or more columns of a Pandas Dataframe in Python?In this tutorial, we explore how can we get the max column value in a...
Published Β  February 12, 2023
Top answer
1 of 15
388

Use the pandas idxmax function. It's straightforward:

>>> import pandas
>>> import numpy as np
>>> df = pandas.DataFrame(np.random.randn(5,3),columns=['A','B','C'])
>>> df
          A         B         C
0  1.232853 -1.979459 -0.573626
1  0.140767  0.394940  1.068890
2  0.742023  1.343977 -0.579745
3  2.125299 -0.649328 -0.211692
4 -0.187253  1.908618 -1.862934
>>> df['A'].idxmax()
3
>>> df['B'].idxmax()
4
>>> df['C'].idxmax()
1
  • Alternatively you could also use numpy.argmax, such as numpy.argmax(df['A']) -- it provides the same thing, and appears at least as fast as idxmax in cursory observations.

  • idxmax() returns indices labels, not integers.

  • Example': if you have string values as your index labels, like rows 'a' through 'e', you might want to know that the max occurs in row 4 (not row 'd').

  • if you want the integer position of that label within the Index you have to get it manually (which can be tricky now that duplicate row labels are allowed).


HISTORICAL NOTES:

  • idxmax() used to be called argmax() prior to 0.11
  • argmax was deprecated prior to 1.0.0 and removed entirely in 1.0.0
  • back as of Pandas 0.16, argmax used to exist and perform the same function (though appeared to run more slowly than idxmax).
  • argmax function returned the integer position within the index of the row location of the maximum element.
  • pandas moved to using row labels instead of integer indices. Positional integer indices used to be very common, more common than labels, especially in applications where duplicate row labels are common.

For example, consider this toy DataFrame with a duplicate row label:

In [19]: dfrm
Out[19]: 
          A         B         C
a  0.143693  0.653810  0.586007
b  0.623582  0.312903  0.919076
c  0.165438  0.889809  0.000967
d  0.308245  0.787776  0.571195
e  0.870068  0.935626  0.606911
f  0.037602  0.855193  0.728495
g  0.605366  0.338105  0.696460
h  0.000000  0.090814  0.963927
i  0.688343  0.188468  0.352213
i  0.879000  0.105039  0.900260

In [20]: dfrm['A'].idxmax()
Out[20]: 'i'

In [21]: dfrm.iloc[dfrm['A'].idxmax()]  # .ix instead of .iloc in older versions of pandas
Out[21]: 
          A         B         C
i  0.688343  0.188468  0.352213
i  0.879000  0.105039  0.900260

So here a naive use of idxmax is not sufficient, whereas the old form of argmax would correctly provide the positional location of the max row (in this case, position 9).

This is exactly one of those nasty kinds of bug-prone behaviors in dynamically typed languages that makes this sort of thing so unfortunate, and worth beating a dead horse over. If you are writing systems code and your system suddenly gets used on some data sets that are not cleaned properly before being joined, it's very easy to end up with duplicate row labels, especially string labels like a CUSIP or SEDOL identifier for financial assets. You can't easily use the type system to help you out, and you may not be able to enforce uniqueness on the index without running into unexpectedly missing data.

So you're left with hoping that your unit tests covered everything (they didn't, or more likely no one wrote any tests) -- otherwise (most likely) you're just left waiting to see if you happen to smack into this error at runtime, in which case you probably have to go drop many hours worth of work from the database you were outputting results to, bang your head against the wall in IPython trying to manually reproduce the problem, finally figuring out that it's because idxmax can only report the label of the max row, and then being disappointed that no standard function automatically gets the positions of the max row for you, writing a buggy implementation yourself, editing the code, and praying you don't run into the problem again.

2 of 15
103

You might also try idxmax:

In [5]: df = pandas.DataFrame(np.random.randn(10,3),columns=['A','B','C'])

In [6]: df
Out[6]: 
          A         B         C
0  2.001289  0.482561  1.579985
1 -0.991646 -0.387835  1.320236
2  0.143826 -1.096889  1.486508
3 -0.193056 -0.499020  1.536540
4 -2.083647 -3.074591  0.175772
5 -0.186138 -1.949731  0.287432
6 -0.480790 -1.771560 -0.930234
7  0.227383 -0.278253  2.102004
8 -0.002592  1.434192 -1.624915
9  0.404911 -2.167599 -0.452900

In [7]: df.idxmax()
Out[7]: 
A    0
B    8
C    7

e.g.

In [8]: df.loc[df['A'].idxmax()]
Out[8]: 
A    2.001289
B    0.482561
C    1.579985
Top answer
1 of 3
8

You can use where for filtering by conditions and then groupby by Series df['id'] with transform:

df['free_capacity'] = df['Volume'].where(df['time_normalised'] <= 1.1)
                                  .groupby(df['id'])
                                  .transform('max')
print df
            id  volume  saturation  time_delay_normalised     speed  \
0  27WESTBOUND     580    0.351515                     57  6.542484   
1  27WESTBOUND     588    0.356364                    100  5.107143   
2  27WESTBOUND     475    0.287879                     64  6.256250   
3  27EASTBOUND     401    0.243030                     59  6.458065   
4  27EASTBOUND     438    0.265455                     46  7.049296   
5  27EASTBOUND     467    0.283030                     58  6.500000   

   BPR_free_speed  BPR_speed  Volume  time_normalised  free_capacity  
0           17.88  15.913662     580         1.593750          475.0  
1           17.88  15.865198     588         2.041667          475.0  
2           17.88  16.511613     475         0.666667          475.0  
3           17.88  16.882837     401         1.091458          467.0  
4           17.88  16.703004     438         1.479167          467.0  
5           17.88  16.553928     467         0.960417          467.0  

It is same if use where for creating new column Volume1 by your criteria:

df['Volume1'] = df['Volume'].where(df['time_normalised'] <= 1.1)
print df
            id  volume  saturation  time_delay_normalised     speed  \
0  27WESTBOUND     580    0.351515                     57  6.542484   
1  27WESTBOUND     588    0.356364                    100  5.107143   
2  27WESTBOUND     475    0.287879                     64  6.256250   
3  27EASTBOUND     401    0.243030                     59  6.458065   
4  27EASTBOUND     438    0.265455                     46  7.049296   
5  27EASTBOUND     467    0.283030                     58  6.500000   

   BPR_free_speed  BPR_speed  Volume  time_normalised  Volume1  
0           17.88  15.913662     580         1.593750      NaN  
1           17.88  15.865198     588         2.041667      NaN  
2           17.88  16.511613     475         0.666667    475.0  
3           17.88  16.882837     401         1.091458    401.0  
4           17.88  16.703004     438         1.479167      NaN  
5           17.88  16.553928     467         0.960417    467.0 

Use groupby with transform with new column Volume1:

df['free_capacity'] = df.groupby('id')["Volume1"].transform('max')
print df
            id  volume  saturation  time_delay_normalised     speed  \
0  27WESTBOUND     580    0.351515                     57  6.542484   
1  27WESTBOUND     588    0.356364                    100  5.107143   
2  27WESTBOUND     475    0.287879                     64  6.256250   
3  27EASTBOUND     401    0.243030                     59  6.458065   
4  27EASTBOUND     438    0.265455                     46  7.049296   
5  27EASTBOUND     467    0.283030                     58  6.500000   

   BPR_free_speed  BPR_speed  Volume  time_normalised  Volume1  free_capacity  
0           17.88  15.913662     580         1.593750      NaN          475.0  
1           17.88  15.865198     588         2.041667      NaN          475.0  
2           17.88  16.511613     475         0.666667    475.0          475.0  
3           17.88  16.882837     401         1.091458    401.0          467.0  
4           17.88  16.703004     438         1.479167      NaN          467.0  
5           17.88  16.553928     467         0.960417    467.0          467.0  
2 of 3
1

Consider also a groupby().apply():

def maxtime(row):
    row['free_capacity'] = row[row['time_normalised'] <= 1.1]['Volume'].max()
    return row

df = df.groupby('id').apply(maxtime)