There is a difference. pd.DataFrames And pd.Series have an index, which might not be consecutive, e.g. [0 ... n), but is when you don't specify it during creation. Therefore people often confuse them.
Consider this parabola
import pandas as pd
import numpy as np
data = pd.Series(16 - np.arange(-4,5) ** 2)
0 0
1 7
2 12
3 15
4 16
5 15
6 12
7 7
8 0
dtype: int64
The labels are set to [0 ... 9), because we didn't specify them. In this case, both data.argmax() and data.idxmax() result in 4, because that's the integer position and label for 16.
However, if we filter out the odd values, that the index isn't consecutive anymore:
filtered = data[data % 2 == 0]
0 0
2 12
4 16
6 12
8 0
dtype: int64
Here, filtered.argmax() returns 2 whereas filtered.idxmax() returns 4.
This is particularly relevant when you want to reference data, using entries extracted from filtered. That is data.loc[4] will return the found value via the unfiltered version.
The current behavior of ‘Series.argmax’ is deprecated, use ‘idxmax’. This is what I found written in the documentation. Can you tell me what are the differences b/w these two? Where should use one or another?
What are the advantages of one on another?
API: .argmax should be positional, not an alias for idxmax
Pandas argmax() is deprecated - replace with idxmax()
BUG: pd.Series idxmax raises ValueError instead of returning <NA> when all values are <NA>
python - Series. max and idxmax - Stack Overflow
Videos
What about a custom function? Something like
import numpy as np
import pandas as pd
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
def Max_Argmax(series): # takes as input your series
values = series.values # store numeric values
indexes = series.index # store indexes
Argmax = np.argmax(values) # save index of max
return values[Argmax], indexes[Argmax] # return max and corresponding index
(max, index) = Max_Argmax(s)
I run it on my PC and I get:
>>> s
a -1.854440
b 0.302282
c -0.630175
d -1.012799
e 0.239437
dtype: float64
>>> max
0.3022819091746019
>>> index
'b'
Hope it helps!
As Jon Clements mentioned:
In [3]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
In [4]: x, y = s.agg(['max', 'idxmax'])
In [5]: x
Out[5]: 1.6339096862287581
In [6]: y
Out[6]: 'b'
In [7]: s
Out[7]: a 1.245039
b 1.633910
c 0.619384
d 0.369604
e 1.009942
dtype: float64
In response to asking for a tuple:
def max_and_index(series):
"""Return a tuple of (max, idxmax) from a pandas.Series"""
x, y = series.agg(['max', 'idxmax'])
return x, y
t = max_and_idxmax(s)
print(t)
(1.6339096862287581, 'b')
print(type(t))
<class 'tuple'>
Even smaller:
def max_and_idxmax(series):
"""Return a tuple of (max, idxmax) from a pandas.Series"""
return series.max(), series.idxmax()
If you need speed, use the numpy method above
import pandas as pd
import numpy as np
def max_and_index(series):
x, y = series.agg(['max', 'idxmax'])
return x, y
def max_and_idxmax(series):
return series.max(), series.idxmax()
def np_max_and_argmax(series):
return np.max(series.values), np.argmax(series.values)
def Max_Argmax(series):
v = series.values
i = series.index
arg = np.argmax(v)
return v[arg], i[arg]
a = []
for i in range(2,9,1):
a.append(pd.Series(np.random.randint(0, 100, size=10**i)))
print('{}\t{:>11,}'.format(i-2, 10**i))
# 0 100
# 1 1,000
# 2 10,000
# 3 100,000
# 4 1,000,000
# 5 10,000,000
# 6 100,000,000
idx = 5
%%timeit -n 2 -r 10
max_and_index(a[idx])
# 144 ms ± 5.45 ms per loop (mean ± std. dev. of 10 runs, 2 loops each)
%%timeit -n 2 -r 10
max_and_idxmax(a[idx])
# 143 ms ± 5.14 ms per loop (mean ± std. dev. of 10 runs, 2 loops each)
%%timeit -n 2 -r 10
Max_Argmax(a[idx])
# 9.89 ms ± 1.13 ms per loop (mean ± std. dev. of 10 runs, 2 loops each)
%%timeit -n 2 -r 10
np_max_and_argmax(a[idx])
# 24.5 ms ± 1.74 ms per loop (mean ± std. dev. of 10 runs, 2 loops each)