There is a clean, one-line way of doing this in Pandas:
df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)
This allows f to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.
Example with data (based on original question):
import pandas as pd
df = pd.DataFrame({'ID':['1', '2', '3'], 'col_1': [0, 2, 3], 'col_2':[1, 4, 5]})
mylist = ['a', 'b', 'c', 'd', 'e', 'f']
def get_sublist(sta,end):
return mylist[sta:end+1]
df['col_3'] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)
Output of print(df):
ID col_1 col_2 col_3
0 1 0 1 [a, b]
1 2 2 4 [c, d, e]
2 3 3 5 [d, e, f]
If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:
df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1)
Answer from ajrwhite on Stack OverflowThere is a clean, one-line way of doing this in Pandas:
df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)
This allows f to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.
Example with data (based on original question):
import pandas as pd
df = pd.DataFrame({'ID':['1', '2', '3'], 'col_1': [0, 2, 3], 'col_2':[1, 4, 5]})
mylist = ['a', 'b', 'c', 'd', 'e', 'f']
def get_sublist(sta,end):
return mylist[sta:end+1]
df['col_3'] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)
Output of print(df):
ID col_1 col_2 col_3
0 1 0 1 [a, b]
1 2 2 4 [c, d, e]
2 3 3 5 [d, e, f]
If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:
df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1)
Here's an example using apply on the dataframe, which I am calling with axis = 1.
Note the difference is that instead of trying to pass two values to the function f, rewrite the function to accept a pandas Series object, and then index the Series to get the values needed.
In [49]: df
Out[49]:
0 1
0 1.000000 0.000000
1 -0.494375 0.570994
2 1.000000 0.000000
3 1.876360 -0.229738
4 1.000000 0.000000
In [50]: def f(x):
....: return x[0] + x[1]
....:
In [51]: df.apply(f, axis=1) #passes a Series object, row-wise
Out[51]:
0 1.000000
1 0.076619
2 1.000000
3 1.646622
4 1.000000
Depending on your use case, it is sometimes helpful to create a pandas group object, and then use apply on the group.
Applying function to values in multiple columns in Pandas Dataframe.
As far as the defining columns twice part goes, you should define the ones to be zfilled once and then reference it in both places. Then you can use applymap and ditch one lambda:
zfill_cols = ['Date', 'Departure time', 'Arrival time'] df[zfill_cols] = df[zfill_cols].applymap(lambda s: s.zfill(4))
Or on the entire dataframe:
df = df.applymap(lambda s: s.zfill(4))
EDIT: You can also use DataFrame.apply and Series.str.zfill which is probably faster because it takes advantage of vector functions (unlike Series.apply and DataFrame.applymap:
df[zfill_cols] = df[zfill_cols].apply(lambda se: se.str.zfill(4))
Or
df = df.apply(lambda se: se.str.zfill(4))More on reddit.com
Append multiple columns applying function that use multiple columns as attr (Pandas)
python - applying lambda row on multiple columns pandas - Stack Overflow
Feature-Request: Allow lambda function with different columns in transform
Videos
I have a Dataframe with strings and I want to apply zfill to strings in some of the columns. Here's how I do it:
df[['Date', 'Departure time','Arrival time']] = df[['Date', 'Departure time','Arrival time']].apply(lambda x: x.apply(lambda y: y.zfill(4)))
It works as intended, but my question is, am I doing it right? Do I really have to put apply into an apply? And do i have to write this huge df[['Date', 'Departure time','Arrival time']] twice (imagine if I had 20 columns to modify). Is there a cleaner way to do it?
I have a DataFrame and a function, and I'd like to append 'c', 'd' col using a,b passed into function.
df = pd.DataFrame({
'a' : [1,2,3],
'b' : [4,5,6],})
def f(a,b):
return a+b, a-b
# What I assumed it should work, it did not.
df[['c', 'd']] = df.apply(lambda x: f(x.a, x.b), axis=1)
>>> ValueError: Columns must be same length as keyI know several ways that could make it work but it seems pretty hard-coded. I wonder how is the above not working and if it's possible to fix it, using apply method.