Assuming df has a unique index, this gives the row with the maximum value:
In [34]: df.loc[df['Value'].idxmax()]
Out[34]:
Country US
Place Kansas
Value 894
Name: 7
Note that idxmax returns index labels. So if the DataFrame has duplicates in the index, the label may not uniquely identify the row, so df.loc may return more than one row.
Therefore, if df does not have a unique index, you must make the index unique before proceeding as above. Depending on the DataFrame, sometimes you can use stack or set_index to make the index unique. Or, you can simply reset the index (so the rows become renumbered, starting at 0):
df = df.reset_index()
Answer from unutbu on Stack OverflowAssuming df has a unique index, this gives the row with the maximum value:
In [34]: df.loc[df['Value'].idxmax()]
Out[34]:
Country US
Place Kansas
Value 894
Name: 7
Note that idxmax returns index labels. So if the DataFrame has duplicates in the index, the label may not uniquely identify the row, so df.loc may return more than one row.
Therefore, if df does not have a unique index, you must make the index unique before proceeding as above. Depending on the DataFrame, sometimes you can use stack or set_index to make the index unique. Or, you can simply reset the index (so the rows become renumbered, starting at 0):
df = df.reset_index()
df[df['Value']==df['Value'].max()]
This will return the entire row with max value
Pandas return row with the maximum value of a column
Pandas: Get the max value of a group ONLY if the value satisfies given conditions
What is the method to find the maximum value in each row of a dataframe? - Python - Data Science Dojo Discussions
Merging four similar Pandas DataFrames and keeping the maximum rows
Videos
I have a huge datatset.
The data is grouped by col, row, year, no, potveg, and total. I am trying to get the maximum value of the 'total' column in each specific group year ONLY if its 'possible' value is TRUE. If the max 'total' value is FALSE, then get the second max value, and so on.
If all the values of the 'possible' column in a specific year group = False, then I want to pick the max out of the False so that I don't skip any years.
i.e., for the dataset below:
col row year no potveg total possible
-125 42.5 2015 1 9 697.3 FALSE
-125 42.5 2015 2 13 535.2 TRUE
-125 42.5 2015 3 15 82.3 TRUE
-125 42.5 2016 1 9 907.8 TRUE
-125 42.5 2016 2 13 137.6 FALSE
-125 42.5 2016 3 15 268.4 TRUE
-125 42.5 2017 1 9 961.9 FALSE
-125 42.5 2017 2 13 74.2 TRUE
-125 42.5 2017 3 15 248 TRUE
-125 42.5 2018 1 9 937.9 TRUE
-125 42.5 2018 2 13 575.6 TRUE
-125 42.5 2018 3 15 215.5 FALSE
-135 70.5 2015 1 8 697.3 FALSE
-135 70.5 2015 2 10 535.2 TRUE
-135 70.5 2015 3 19 82.3 TRUE
-135 70.5 2016 1 8 907.8 TRUE
-135 70.5 2016 2 10 137.6 FALSE
-135 70.5 2016 3 19 268.4 TRUE
-135 70.5 2017 1 8 961.9 FALSE
-135 70.5 2017 2 10 74.2 TRUE
-135 70.5 2017 3 19 248 TRUE
-135 70.5 2018 1 8 937.9 TRUE
-135 70.5 2018 2 10 575.6 TRUE
-135 70.5 2018 3 19 215.5 FALSE
-135 70.5 2019 1 8 937.9 FALSE
-135 70.5 2019 2 10 575.6 FALSE
-135 70.5 2019 3 19 215.5 FALSEThe output would be:
col row year no potveg total possible -125 42.5 2015 2 13 535.2 TRUE -125 42.5 2016 1 9 907.8 TRUE -125 42.5 2017 3 15 248 TRUE -125 42.5 2018 1 9 937.9 TRUE -135 70.5 2015 2 10 535.2 TRUE -135 70.5 2016 1 8 907.8 TRUE -135 70.5 2017 3 19 248 TRUE -135 70.5 2018 1 8 937.9 TRUE -135 70.5 2019 1 8 937.9 FALSE
I have tried
# Separate out the true and false possibilities by grouping by ['col','row','year','possible']
# and getting the idxmax for column total. At the end, we sort the result on possible in descending order.
# This puts all idxmax values (now in total) with True in possible first.
idx = df.groupby(['col','row','year','possible'], as_index=False)['total']\
.idxmax().sort_values('possible', ascending=False)['total']
#we then apply a second groupby, this time only on['col', 'row', 'year'] and simply get the first.
result = df.iloc[idx].groupby(['col', 'row', 'year']).first()
orig_index = df.set_index(['col', 'row', 'year']).index.drop_duplicates()
#re-establishing the original order by using df.reindex based on the original df
# with index there set to ['col','row','year'] and getting rid of the duplicates first.
result_reordered = result.reindex(orig_index)But I am still getting some years where the max value is not picked resulting in duplicates.