Assuming df has a unique index, this gives the row with the maximum value:
In [34]: df.loc[df['Value'].idxmax()]
Out[34]:
Country US
Place Kansas
Value 894
Name: 7
Note that idxmax returns index labels. So if the DataFrame has duplicates in the index, the label may not uniquely identify the row, so df.loc may return more than one row.
Therefore, if df does not have a unique index, you must make the index unique before proceeding as above. Depending on the DataFrame, sometimes you can use stack or set_index to make the index unique. Or, you can simply reset the index (so the rows become renumbered, starting at 0):
df = df.reset_index()
Answer from unutbu on Stack OverflowAssuming df has a unique index, this gives the row with the maximum value:
In [34]: df.loc[df['Value'].idxmax()]
Out[34]:
Country US
Place Kansas
Value 894
Name: 7
Note that idxmax returns index labels. So if the DataFrame has duplicates in the index, the label may not uniquely identify the row, so df.loc may return more than one row.
Therefore, if df does not have a unique index, you must make the index unique before proceeding as above. Depending on the DataFrame, sometimes you can use stack or set_index to make the index unique. Or, you can simply reset the index (so the rows become renumbered, starting at 0):
df = df.reset_index()
df[df['Value']==df['Value'].max()]
This will return the entire row with max value
Videos
You can get the maximum like this:
>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [-2, 8, 1]})
>>> df
A B
0 1 -2
1 2 8
2 3 1
>>> df[["A", "B"]]
A B
0 1 -2
1 2 8
2 3 1
>>> df[["A", "B"]].max(axis=1)
0 1
1 8
2 3
and so:
>>> df["C"] = df[["A", "B"]].max(axis=1)
>>> df
A B C
0 1 -2 1
1 2 8 8
2 3 1 3
If you know that "A" and "B" are the only columns, you could even get away with
>>> df["C"] = df.max(axis=1)
And you could use .apply(max, axis=1) too, I guess.
@DSM's answer is perfectly fine in almost any normal scenario. But if you're the type of programmer who wants to go a little deeper than the surface level, you might be interested to know that it is a little faster to call numpy functions on the underlying .to_numpy() (or .values for <0.24) array instead of directly calling the (cythonized) functions defined on the DataFrame/Series objects.
For example, you can use ndarray.max() along the first axis.
# Data borrowed from @DSM's post.
df = pd.DataFrame({"A": [1,2,3], "B": [-2, 8, 1]})
df
A B
0 1 -2
1 2 8
2 3 1
df['C'] = df[['A', 'B']].values.max(1)
# Or, assuming "A" and "B" are the only columns,
# df['C'] = df.values.max(1)
df
A B C
0 1 -2 1
1 2 8 8
2 3 1 3
If your data has NaNs, you will need numpy.nanmax:
df['C'] = np.nanmax(df.values, axis=1)
df
A B C
0 1 -2 1
1 2 8 8
2 3 1 3
You can also use numpy.maximum.reduce. numpy.maximum is a ufunc (Universal Function), and every ufunc has a reduce:
df['C'] = np.maximum.reduce(df['A', 'B']].values, axis=1)
# df['C'] = np.maximum.reduce(df[['A', 'B']], axis=1)
# df['C'] = np.maximum.reduce(df, axis=1)
df
A B C
0 1 -2 1
1 2 8 8
2 3 1 3

np.maximum.reduce and np.max appear to be more or less the same (for most normal sized DataFrames)โand happen to be a shade faster than DataFrame.max. I imagine this difference roughly remains constant, and is due to internal overhead (indexing alignment, handling NaNs, etc).
The graph was generated using perfplot. Benchmarking code, for reference:
import pandas as pd
import perfplot
np.random.seed(0)
df_ = pd.DataFrame(np.random.randn(5, 1000))
perfplot.show(
setup=lambda n: pd.concat([df_] * n, ignore_index=True),
kernels=[
lambda df: df.assign(new=df.max(axis=1)),
lambda df: df.assign(new=df.values.max(1)),
lambda df: df.assign(new=np.nanmax(df.values, axis=1)),
lambda df: df.assign(new=np.maximum.reduce(df.values, axis=1)),
],
labels=['df.max', 'np.max', 'np.maximum.reduce', 'np.nanmax'],
n_range=[2**k for k in range(0, 15)],
xlabel='N (* len(df))',
logx=True,
logy=True)