You can use apply() method:
import numpy as np
import pandas as pl
np.random.seed(0)
people2 = pd.DataFrame(np.random.randn(5, 5),
columns=['a', 'b', 'c', 'd', 'e'],
index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
key = ['one', 'two', 'one', 'two', 'one']
Grouped = people2.groupby(key)
def f(df):
df["f"] = (df.a.mean() - df.b.mean())*df.c
return df
people2 = Grouped.apply(f)
print people2
If you want some generalize method:
Grouped = people2.groupby(key)
def f(a, b, c, **kw):
return (a.mean() - b.mean())*c
people2["f"] = Grouped.apply(lambda df:f(**df))
print people2
Answer from HYRY on Stack OverflowYou can use apply() method:
import numpy as np
import pandas as pl
np.random.seed(0)
people2 = pd.DataFrame(np.random.randn(5, 5),
columns=['a', 'b', 'c', 'd', 'e'],
index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
key = ['one', 'two', 'one', 'two', 'one']
Grouped = people2.groupby(key)
def f(df):
df["f"] = (df.a.mean() - df.b.mean())*df.c
return df
people2 = Grouped.apply(f)
print people2
If you want some generalize method:
Grouped = people2.groupby(key)
def f(a, b, c, **kw):
return (a.mean() - b.mean())*c
people2["f"] = Grouped.apply(lambda df:f(**df))
print people2
This is based upon the answer provided by HYRY (thanks) who made me see how this could be achieved. My version does nothing more than generalise the function and enter the arguments of the function when it is called. I think though the function has to be called with a lambda:
import pandas as pd
import numpy as np
people = DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
key = ['one', 'two', 'one', 'two', 'one']
people['f'] = ""
Grouped = people.groupby(key)
def FUNC(df, col1, col2, col3, col4):
df[col1] = (df[col2].mean() - df[col3].mean())*df[col4]
return df
people2 = Grouped.transform(lambda x: FUNC(x, 'f', 'a', 'b', 'c'))
This appears to me to be the best way I have seen of doing this... Basically the entire grouped data frame is passed to the function as x, and then columns can be called as arguments.
Videos
for this particular case you could do:
g = df.groupby(['c', 'd'])
df['e'] = g.a.transform('sum') + g.b.transform('sum')
df
# outputs
a b c d e
0 1 1 q z 12
1 2 2 q z 12
2 3 3 q z 12
3 4 4 q o 8
4 5 5 w o 22
5 6 6 w o 22
if you can construct the final result by a linear combination of the independent transforms on the same groupby, this method would work.
otherwise, you'd use a groupby-apply and then merge back to the original df.
example:
_ = df.groupby(['c','d']).apply(lambda x: sum(x.a+x.b)).rename('e').reset_index()
df.merge(_, on=['c','d'])
# same output as above.
You can use GroupBy + transform with sum twice:
df['e'] = df.groupby(['c', 'd'])[['a', 'b']].transform('sum').sum(1)
print(df)
a b c d e
0 1 1 q z 12
1 2 2 q z 12
2 3 3 q z 12
3 4 4 q o 8
4 5 5 w o 22
5 6 6 w o 22
Supposed you have a dataframe named df
You can first make a list of possible numeric types, then just do a loop
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
for c in [c for c in df.columns if df[c].dtype in numerics]:
df[c] = np.log10(df[c])
Or, a one-liner solution with lambda operator and np.dtype.kind
numeric_df = df.apply(lambda x: np.log10(x) if np.issubdtype(x.dtype, np.number) else x)
If most columns are numeric it might make sense to just try it and skip the column if it does not work:
for column in df.columns:
try:
df[column] = np.log10(df[column])
except (ValueError, AttributeError):
pass
If you want to you could wrap it in a function, of course.
If all columns are numeric, you can even simply do
df_log10 = np.log10(df)