pandas apply lambda multiple columns

How to apply a function to two columns of Pandas dataframe

stackoverflow.com › questions › 13331698 › how-to-apply-a-function-to-two-columns-of-pandas-dataframe

There is a clean, one-line way of doing this in Pandas:

df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)

This allows f to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.

Example with data (based on original question):

import pandas as pd

df = pd.DataFrame({'ID':['1', '2', '3'], 'col_1': [0, 2, 3], 'col_2':[1, 4, 5]})
mylist = ['a', 'b', 'c', 'd', 'e', 'f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

df['col_3'] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)

Output of print(df):

  ID  col_1  col_2      col_3
0  1      0      1     [a, b]
1  2      2      4  [c, d, e]
2  3      3      5  [d, e, f]

If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:

df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1)

Answer from ajrwhite on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 13331698 › how-to-apply-a-function-to-two-columns-of-pandas-dataframe

python - How to apply a function to two columns of Pandas dataframe - Stack Overflow

Top answer

1 of 16

717

There is a clean, one-line way of doing this in Pandas:

df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)

This allows f to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.

Example with data (based on original question):

import pandas as pd

df = pd.DataFrame({'ID':['1', '2', '3'], 'col_1': [0, 2, 3], 'col_2':[1, 4, 5]})
mylist = ['a', 'b', 'c', 'd', 'e', 'f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

df['col_3'] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)

Output of print(df):

  ID  col_1  col_2      col_3
0  1      0      1     [a, b]
1  2      2      4  [c, d, e]
2  3      3      5  [d, e, f]

If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:

df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1)

2 of 16

483

Here's an example using apply on the dataframe, which I am calling with axis = 1.

Note the difference is that instead of trying to pass two values to the function f, rewrite the function to accept a pandas Series object, and then index the Series to get the values needed.

In [49]: df
Out[49]: 
          0         1
0  1.000000  0.000000
1 -0.494375  0.570994
2  1.000000  0.000000
3  1.876360 -0.229738
4  1.000000  0.000000

In [50]: def f(x):    
   ....:  return x[0] + x[1]  
   ....:  

In [51]: df.apply(f, axis=1) #passes a Series object, row-wise
Out[51]: 
0    1.000000
1    0.076619
2    1.000000
3    1.646622
4    1.000000

Depending on your use case, it is sometimes helpful to create a pandas group object, and then use apply on the group.

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.apply.html

pandas.DataFrame.apply — pandas documentation - PyData |

The resulting column names will be the originals. >>> df.apply(lambda x: [1, 2], axis=1, result_type="broadcast") A B 0 1 2 1 1 2 2 1 2 · Advanced users can speed up their code by using a Just-in-time (JIT) compiler with apply. The main JIT compilers available for pandas are Numba and Bodo.

Discussions

Applying function to values in multiple columns in Pandas Dataframe.

As far as the defining columns twice part goes, you should define the ones to be zfilled once and then reference it in both places. Then you can use applymap and ditch one lambda:

zfill_cols = ['Date', 'Departure time', 'Arrival time']
df[zfill_cols] = df[zfill_cols].applymap(lambda s: s.zfill(4))

Or on the entire dataframe:

df = df.applymap(lambda s: s.zfill(4))

EDIT: You can also use DataFrame.apply and Series.str.zfill which is probably faster because it takes advantage of vector functions (unlike Series.apply and DataFrame.applymap:

df[zfill_cols] = df[zfill_cols].apply(lambda se: se.str.zfill(4))

df = df.apply(lambda se: se.str.zfill(4))

Videos