You can go with @greenAfrican example, if it's possible for you to rewrite your function. But if you don't want to rewrite your function, you can wrap it into anonymous function inside apply, like this:
>>> def fxy(x, y):
... return x * y
>>> df['newcolumn'] = df.apply(lambda x: fxy(x['A'], x['B']), axis=1)
>>> df
A B newcolumn
0 10 20 200
1 20 30 600
2 30 10 300
Answer from roman on Stack OverflowYou can go with @greenAfrican example, if it's possible for you to rewrite your function. But if you don't want to rewrite your function, you can wrap it into anonymous function inside apply, like this:
>>> def fxy(x, y):
... return x * y
>>> df['newcolumn'] = df.apply(lambda x: fxy(x['A'], x['B']), axis=1)
>>> df
A B newcolumn
0 10 20 200
1 20 30 600
2 30 10 300
Alternatively, you can use numpy underlying function:
>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
>>> df['new_column'] = np.multiply(df['A'], df['B'])
>>> df
A B new_column
0 10 20 200
1 20 30 600
2 30 10 300
or vectorize arbitrary function in general case:
>>> def fx(x, y):
... return x*y
...
>>> df['new_column'] = np.vectorize(fx)(df['A'], df['B'])
>>> df
A B new_column
0 10 20 200
1 20 30 600
2 30 10 300
python - Passing a function with multiple arguments to DataFrame.apply - Stack Overflow
python - Apply function with two arguments to columns - Stack Overflow
Append multiple columns applying function that use multiple columns as attr (Pandas)
[Pandas] Why no argument required in apply() function?
Videos
There is a clean, one-line way of doing this in Pandas:
df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)
This allows f to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.
Example with data (based on original question):
import pandas as pd
df = pd.DataFrame({'ID':['1', '2', '3'], 'col_1': [0, 2, 3], 'col_2':[1, 4, 5]})
mylist = ['a', 'b', 'c', 'd', 'e', 'f']
def get_sublist(sta,end):
return mylist[sta:end+1]
df['col_3'] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)
Output of print(df):
ID col_1 col_2 col_3
0 1 0 1 [a, b]
1 2 2 4 [c, d, e]
2 3 3 5 [d, e, f]
If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:
df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1)
Here's an example using apply on the dataframe, which I am calling with axis = 1.
Note the difference is that instead of trying to pass two values to the function f, rewrite the function to accept a pandas Series object, and then index the Series to get the values needed.
In [49]: df
Out[49]:
0 1
0 1.000000 0.000000
1 -0.494375 0.570994
2 1.000000 0.000000
3 1.876360 -0.229738
4 1.000000 0.000000
In [50]: def f(x):
....: return x[0] + x[1]
....:
In [51]: df.apply(f, axis=1) #passes a Series object, row-wise
Out[51]:
0 1.000000
1 0.076619
2 1.000000
3 1.646622
4 1.000000
Depending on your use case, it is sometimes helpful to create a pandas group object, and then use apply on the group.
It's just the way you think it would be, apply accepts args and kwargs and passes them directly to some_func.
df.apply(some_func, var1='DOG', axis=1)
Or,
df.apply(some_func, args=('DOG', ), axis=1)
0 foo-x-DOG
1 bar-y-DOG
dtype: object
If for any reason that won't work for your use case, then you can always fallback to using a lambda:
df.apply(lambda row: some_func(row, 'DOG'), axis=1)
0 foo-x-DOG
1 bar-y-DOG
dtype: object
You should use vectorized logic:
df['C'] = df['A'] + '-' + df['B'] + '-DOG'
If you really want to use df.apply, which is just a thinly veiled loop, you can simply feed your arguments as additional parameters:
def some_func(row, var1):
return '{0}-{1}-{2}'.format(row['A'], row['B'], var1)
df['C'] = df.apply(some_func, var1='DOG', axis=1)
As per the docs, df.apply accepts both positional and keyword arguments.
Why not just do this?
df['NewCol'] = df.apply(lambda x: segmentMatch(x['TimeCol'], x['ResponseCol']),
axis=1)
Rather than trying to pass the column as an argument as in your example, we now simply pass the appropriate entries in each row as argument, and store the result in 'NewCol'.
You don't really need a lambda function if you are defining the function outside:
def segmentMatch(vec):
RealTime = vec[0]
ResponseTime = vec[1]
if RealTime <= 566 and ResponseTime <= 566:
matchVar = 1
elif 566 < RealTime <= 1132 and 566 < ResponseTime <= 1132:
matchVar = 1
elif 1132 < RealTime <= 1698 and 1132 < ResponseTime <= 1698:
matchVar = 1
else:
matchVar = 0
return matchVar
df['NewCol'] = df[['TimeCol', 'ResponseCol']].apply(segmentMatch, axis=1)
If "segmentMatch" were to return a vector of 2 values instead, you could do the following:
def segmentMatch(vec):
......
return pd.Series((matchVar1, matchVar2))
df[['NewCol', 'NewCol2']] = df[['TimeCol','ResponseCol']].apply(segmentMatch, axis=1)
I have a DataFrame and a function, and I'd like to append 'c', 'd' col using a,b passed into function.
df = pd.DataFrame({
'a' : [1,2,3],
'b' : [4,5,6],})
def f(a,b):
return a+b, a-b
# What I assumed it should work, it did not.
df[['c', 'd']] = df.apply(lambda x: f(x.a, x.b), axis=1)
>>> ValueError: Columns must be same length as keyI know several ways that could make it work but it seems pretty hard-coded. I wonder how is the above not working and if it's possible to fix it, using apply method.