Very interesting. There is clearly a bug at least in my version of Pandas (0.25.1) with df.groupby(...).quantile(<array-like>). That codepath is different and seems to be broken even on very simple examples like:
df = pd.DataFrame(
{"A": [0., 0., 0.], "B": ["X", "Y", "Z"]}
)
result = df.groupby("B").quantile([0.5, 0.9])
While it would work on a 2-element version one:
df = pd.DataFrame(
{"A": [0., 0.], "B": ["X", "Y"]}
)
result = df.groupby("B").quantile([0.5, 0.9])
I would avoid using groupby with quantile on array-like objects until the code is fixed even in cases it works now since it is likely error-prone.
Blame also shows a lot of fairly fresh updates there (10, 16 months) dealing exactly with these pieces of code too.
Answer from Alexander Pivovarov on Stack OverflowVideos
Very interesting. There is clearly a bug at least in my version of Pandas (0.25.1) with df.groupby(...).quantile(<array-like>). That codepath is different and seems to be broken even on very simple examples like:
df = pd.DataFrame(
{"A": [0., 0., 0.], "B": ["X", "Y", "Z"]}
)
result = df.groupby("B").quantile([0.5, 0.9])
While it would work on a 2-element version one:
df = pd.DataFrame(
{"A": [0., 0.], "B": ["X", "Y"]}
)
result = df.groupby("B").quantile([0.5, 0.9])
I would avoid using groupby with quantile on array-like objects until the code is fixed even in cases it works now since it is likely error-prone.
Blame also shows a lot of fairly fresh updates there (10, 16 months) dealing exactly with these pieces of code too.
You can't see quantile at work in both examples in the answer from @alexander-pivovarov. There are only zeros and only one element for each group, so the result is always zero. Or am I wrong here?
I have pandas 0.25.3 and get useful results for
import pandas as pd
df = pd.DataFrame(
{"A": [1., 2., 3., 4., 5., 6.], "B": ["X", "X", "Y", "Y", "Z", "Z"]}
)
result = df.groupby("B").quantile([0.5, 0.9])
print(result)
Output:
A
B
X 0.5 1.5
0.9 1.9
Y 0.5 3.5
0.9 3.9
Z 0.5 5.5
0.9 5.9
If it works with a single number passed to quantiles() you could hack something like
q = [0.2, 0.5, 0.9]
res = [df.groupby("B").quantile(_).loc['X', 'A'] for _ in q]
df_q = pd.DataFrame({'A':res, 'quantiles':q})
print(df_q)
Output:
A quantiles
0 1.2 0.2
1 1.5 0.5
2 1.9 0.9
until it is fixed.