Inspired by Jeff's answer. This is the fastest method on my machine:
pd.Series(np.repeat(grp.mean().values, grp.count().values))
Answer from YXD on Stack OverflowInspired by Jeff's answer. This is the fastest method on my machine:
pd.Series(np.repeat(grp.mean().values, grp.count().values))
Current method, using transform
In [44]: grp = df["signal"].groupby(g)
In [45]: result2 = df["signal"].groupby(g).transform(np.mean)
In [47]: %timeit df["signal"].groupby(g).transform(np.mean)
1 loops, best of 3: 535 ms per loop
Using 'broadcasting' of the results
In [43]: result = pd.concat([ Series([r]*len(grp.groups[i])) for i, r in enumerate(grp.mean().values) ],ignore_index=True)
In [42]: %timeit pd.concat([ Series([r]*len(grp.groups[i])) for i, r in enumerate(grp.mean().values) ],ignore_index=True)
10 loops, best of 3: 119 ms per loop
In [46]: result.equals(result2)
Out[46]: True
I think you might need to set the index of the returned on the broadcast result (it happens to work here because its a default index
result = pd.concat([ Series([r]*len(grp.groups[i])) for i, r in enumerate(grp.mean().values) ],ignore_index=True)
result.index = df.index
From what I've learned, all what .transform() does is that it 'stretches' the aggregate function results for every single key value (i.e. the function still performs its stuff like summing up the values, but it won't just return one result per group. Instead for every occurence of a key value in the key column, there will be one result.
If the function is not of aggregating nature, the result seems to be the same whether or not .transform() is used. Is my understanding correct? Or are there some special cases I'm missing?