You can go with @greenAfrican example, if it's possible for you to rewrite your function. But if you don't want to rewrite your function, you can wrap it into anonymous function inside apply, like this:
>>> def fxy(x, y):
... return x * y
>>> df['newcolumn'] = df.apply(lambda x: fxy(x['A'], x['B']), axis=1)
>>> df
A B newcolumn
0 10 20 200
1 20 30 600
2 30 10 300
Answer from roman on Stack OverflowYou can go with @greenAfrican example, if it's possible for you to rewrite your function. But if you don't want to rewrite your function, you can wrap it into anonymous function inside apply, like this:
>>> def fxy(x, y):
... return x * y
>>> df['newcolumn'] = df.apply(lambda x: fxy(x['A'], x['B']), axis=1)
>>> df
A B newcolumn
0 10 20 200
1 20 30 600
2 30 10 300
Alternatively, you can use numpy underlying function:
>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
>>> df['new_column'] = np.multiply(df['A'], df['B'])
>>> df
A B new_column
0 10 20 200
1 20 30 600
2 30 10 300
or vectorize arbitrary function in general case:
>>> def fx(x, y):
... return x*y
...
>>> df['new_column'] = np.vectorize(fx)(df['A'], df['B'])
>>> df
A B new_column
0 10 20 200
1 20 30 600
2 30 10 300
python - Apply function with two arguments to columns - Stack Overflow
[Pandas] Why no argument required in apply() function?
python - Passing a function with multiple arguments to DataFrame.apply - Stack Overflow
python - How to apply a function to two columns of Pandas dataframe - Stack Overflow
Videos
Why not just do this?
df['NewCol'] = df.apply(lambda x: segmentMatch(x['TimeCol'], x['ResponseCol']),
axis=1)
Rather than trying to pass the column as an argument as in your example, we now simply pass the appropriate entries in each row as argument, and store the result in 'NewCol'.
You don't really need a lambda function if you are defining the function outside:
def segmentMatch(vec):
RealTime = vec[0]
ResponseTime = vec[1]
if RealTime <= 566 and ResponseTime <= 566:
matchVar = 1
elif 566 < RealTime <= 1132 and 566 < ResponseTime <= 1132:
matchVar = 1
elif 1132 < RealTime <= 1698 and 1132 < ResponseTime <= 1698:
matchVar = 1
else:
matchVar = 0
return matchVar
df['NewCol'] = df[['TimeCol', 'ResponseCol']].apply(segmentMatch, axis=1)
If "segmentMatch" were to return a vector of 2 values instead, you could do the following:
def segmentMatch(vec):
......
return pd.Series((matchVar1, matchVar2))
df[['NewCol', 'NewCol2']] = df[['TimeCol','ResponseCol']].apply(segmentMatch, axis=1)
import pandas as pd
myDF = pd.DataFrame({'student_names':['Monserta ruff','Gonzalo Fryer','Kris Venmeter'],'grades':[34,58,100]})
def assign_letter(row):
if row >= 90:
result = 'A**'
elif row >=50:
result = 'C'
else:
result ='F'
return result
myDF['letter-grades'] = myDF['grades'].apply(assign_letter) #no argument required?
myDFWhy doesn't the function assign_letter(row) require an argument (no parenthesis) but it still gives the CORRECT Resultant DF (as below)?
| student_names | grades | letter_grades |
|---|---|---|
| Monserta ruff | 34 | F |
| Gonzalo Fryer | 58 | C |
| Kris Venmeter | 100 | A** |
It's just the way you think it would be, apply accepts args and kwargs and passes them directly to some_func.
df.apply(some_func, var1='DOG', axis=1)
Or,
df.apply(some_func, args=('DOG', ), axis=1)
0 foo-x-DOG
1 bar-y-DOG
dtype: object
If for any reason that won't work for your use case, then you can always fallback to using a lambda:
df.apply(lambda row: some_func(row, 'DOG'), axis=1)
0 foo-x-DOG
1 bar-y-DOG
dtype: object
You should use vectorized logic:
df['C'] = df['A'] + '-' + df['B'] + '-DOG'
If you really want to use df.apply, which is just a thinly veiled loop, you can simply feed your arguments as additional parameters:
def some_func(row, var1):
return '{0}-{1}-{2}'.format(row['A'], row['B'], var1)
df['C'] = df.apply(some_func, var1='DOG', axis=1)
As per the docs, df.apply accepts both positional and keyword arguments.
There is a clean, one-line way of doing this in Pandas:
df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)
This allows f to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.
Example with data (based on original question):
import pandas as pd
df = pd.DataFrame({'ID':['1', '2', '3'], 'col_1': [0, 2, 3], 'col_2':[1, 4, 5]})
mylist = ['a', 'b', 'c', 'd', 'e', 'f']
def get_sublist(sta,end):
return mylist[sta:end+1]
df['col_3'] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)
Output of print(df):
ID col_1 col_2 col_3
0 1 0 1 [a, b]
1 2 2 4 [c, d, e]
2 3 3 5 [d, e, f]
If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:
df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1)
Here's an example using apply on the dataframe, which I am calling with axis = 1.
Note the difference is that instead of trying to pass two values to the function f, rewrite the function to accept a pandas Series object, and then index the Series to get the values needed.
In [49]: df
Out[49]:
0 1
0 1.000000 0.000000
1 -0.494375 0.570994
2 1.000000 0.000000
3 1.876360 -0.229738
4 1.000000 0.000000
In [50]: def f(x):
....: return x[0] + x[1]
....:
In [51]: df.apply(f, axis=1) #passes a Series object, row-wise
Out[51]:
0 1.000000
1 0.076619
2 1.000000
3 1.646622
4 1.000000
Depending on your use case, it is sometimes helpful to create a pandas group object, and then use apply on the group.
The method above is ok I guess if it worked... In my opinion it does not answer the question because you're concatenating two arguments into one.
A way to do this to allow you to pass two arguments to apply:
df['PageCLass'] = df[['PageClass','Rev']].apply(lambda x: PageClassify.page_classify(*x), axis=1)
I don't know what the page_classify method looks like but if it takes two arguments the above should work. Does this work for you?
Assuming you want to just do this row by row, the following should work:
df['PageCLass'] = (df['PageClass'] + df['Rev'].apply(str)).apply(lambda x: PageClassify.page_classify(x))
Here, you are simply concatenating the two dataframe columns together and then you can apply the function to each row in the new column. If you need to check the values of PageClass and Rev as separate arguments, you could also add a delimiter (e.g. '\t') to the concatenation and then simply split on that inside the function:
df['PageCLass'] = (df['PageClass'] + '\t' + df['Rev'].apply(str)).apply(lambda x: PageClassify.page_classify(x))
Hope this helps!