Alternatively, you can use loc:
import pandas as pd
df = pd.DataFrame({"age": [-100, 300, 400, 500, 600, 700]})
df["age"].loc[(df["age"] < 500) & (df["age"] >= 0)] = 0
Now your df looks like this:
age
0 -100
1 0
2 0
3 500
4 600
5 700
Answer from Jonathan on Stack ExchangeAlternatively, you can use loc:
import pandas as pd
df = pd.DataFrame({"age": [-100, 300, 400, 500, 600, 700]})
df["age"].loc[(df["age"] < 500) & (df["age"] >= 0)] = 0
Now your df looks like this:
age
0 -100
1 0
2 0
3 500
4 600
5 700
You can use Nested List comprehension within the lambda function.
Or
Write a function and call the function on your series using Lambda
How to write a lambda function that is conditional on two variables (columns) in python - Stack Overflow
python - Using lambda if condition on different columns in Pandas dataframe - Stack Overflow
python - Conditional Logic on Pandas DataFrame - Stack Overflow
Applying multiple filters to a list.
Here is a small example that you can build upon:
Basically, lambda x: x.. is the short one-liner of a function. What apply really asks for is a function which you can easily recreate yourself.
import pandas as pd
# Recreate the dataframe
data = dict(Size=[80000,8000000,800000000])
df = pd.DataFrame(data)
# Create a function that returns desired values
# You only need to check upper bound as the next elif-statement will catch the value
def func(x):
if x < 1e6:
return "<1m"
elif x < 1e7:
return "1-10m"
elif x < 5e7:
return "10-50m"
else:
return 'N/A'
# Add elif statements....
df['Classification'] = df['Size'].apply(func)
print(df)
Returns:
Size Classification
0 80000 <1m
1 8000000 1-10m
2 800000000 N/A
The apply lambda function actually does the job here, I just wonder what the problem was.... as your syntax looks ok and it works....
df1= [80000, 8000000, 8000000000, 800000000000]
df=pd.DataFrame(df1)
df.columns=['size']
df['Classification']=df['size'].apply(lambda x: '<1m' if x<1000000 else '1-10m' if 1000000<x<10000000 else '1bi')
df
Output:

Use where:
df['dummyVar '] = df['x'].where((df['x'] > 100) & (df['y'] < 50), df['y'])
This will be much faster than performing an apply operation as it is vectorised.
Like this:
f = lambda x, y: x if x>100 and y<50 else y
Lambda(s) in Python are equivalent to a normal function definition.
def f(x, y):
return x if x>100 and y<50 else y
NB: The body of a Lambda must be a valid expression. This means you cannot use things like: return for example; a Lambda will return the last expression evaluated.
For some good reading see:
- Defining Functions
- Lambdas
is that what you want?
In [300]: frame[['b','c']].apply(lambda x: x['c'] if x['c']>0 else x['b'], axis=1)
Out[300]:
0 -1.099891
1 0.582815
2 0.901591
3 0.900856
dtype: float64
Solution
use a vectorized approach
frame['d'] = frame.b + (frame.c > 0) * (frame.c - frame.b)
Explanation
This is derived from the sum of
(frame.c > 0) * frame.c # frame.c if positive
Plus
(frame.c <= 0) * frame.b # frame.b if c is not positive
However
(frame.c <=0 )
is equivalent to
(1 - frame.c > 0)
and when combined you get
frame['d'] = frame.b + (frame.c > 0) * (frame.c - frame.b)
In [1]: df
Out[1]:
data
0 1
1 2
2 3
3 4
You want to apply a function that conditionally returns a value based on the selected dataframe column.
In [2]: df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
Out[2]:
0 true
1 true
2 false
3 false
Name: data
You can then assign that returned column to a new column in your dataframe:
In [3]: df['desired_output'] = df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
In [4]: df
Out[4]:
data desired_output
0 1 true
1 2 true
2 3 false
3 4 false
Just compare the column with that value:
In [9]: df = pandas.DataFrame([1,2,3,4], columns=["data"])
In [10]: df
Out[10]:
data
0 1
1 2
2 3
3 4
In [11]: df["desired"] = df["data"] > 2.5
In [11]: df
Out[12]:
data desired
0 1 False
1 2 False
2 3 True
3 4 True