In [1]: df
Out[1]:
data
0 1
1 2
2 3
3 4
You want to apply a function that conditionally returns a value based on the selected dataframe column.
In [2]: df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
Out[2]:
0 true
1 true
2 false
3 false
Name: data
You can then assign that returned column to a new column in your dataframe:
In [3]: df['desired_output'] = df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
In [4]: df
Out[4]:
data desired_output
0 1 true
1 2 true
2 3 false
3 4 false
Answer from Zelazny7 on Stack OverflowIn [1]: df
Out[1]:
data
0 1
1 2
2 3
3 4
You want to apply a function that conditionally returns a value based on the selected dataframe column.
In [2]: df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
Out[2]:
0 true
1 true
2 false
3 false
Name: data
You can then assign that returned column to a new column in your dataframe:
In [3]: df['desired_output'] = df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
In [4]: df
Out[4]:
data desired_output
0 1 true
1 2 true
2 3 false
3 4 false
Just compare the column with that value:
In [9]: df = pandas.DataFrame([1,2,3,4], columns=["data"])
In [10]: df
Out[10]:
data
0 1
1 2
2 3
3 4
In [11]: df["desired"] = df["data"] > 2.5
In [11]: df
Out[12]:
data desired
0 1 False
1 2 False
2 3 True
3 4 True
Lambda if statement
pandas - How to use Python lambda without else? - Stack Overflow
python - Lambda including if...elif...else - Stack Overflow
python - lambda if statement or do nothing pandas - Stack Overflow
Videos
Hello,
I'm trying to do a lambda statement where I check the type of the value, how do I implement if in lambda?
something['SomeVariable'].apply(lambda x: <if isinstance(x, str) = true, dosomething>)
If you want to leave the value unchanged you can set the else value to the lambda argument x:
df.col.apply(lambda x: new_value if some_condition else x)
In your language:
tmp_df = someDataframe.groupby('ID').myCol.apply(lambda x: 'a' if (x=='A').any() else x)
Depending on the complexity there is probably a more efficient answer here. I use np.where to get your answer.
import numpy as np
df['conditional'] = np.where(
(df['conditional'] == 'A'), # Condition
'a', # Value if true
df['conditional'] # Value if false
# /\ this is equal to the original value so it has no effect
)
Documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html
Nest if .. elses:
lambda x: x*10 if x<2 else (x**2 if x<4 else x+10)
I do not recommend the use of apply here: it should be avoided if there are better alternatives.
For example, if you are performing the following operation on a Series:
if cond1:
exp1
elif cond2:
exp2
else:
exp3
This is usually a good use case for np.where or np.select.
numpy.where
The if else chain above can be written using
np.where(cond1, exp1, np.where(cond2, exp2, ...))
np.where allows nesting. With one level of nesting, your problem can be solved with,
df['three'] = (
np.where(
df['one'] < 2,
df['one'] * 10,
np.where(df['one'] < 4, df['one'] ** 2, df['one'] + 10))
df
one two three
0 1 6 10
1 2 7 4
2 3 8 9
3 4 9 14
4 5 10 15
numpy.select
Allows for flexible syntax and is easily extensible. It follows the form,
np.select([cond1, cond2, ...], [exp1, exp2, ...])
Or, in this case,
np.select([cond1, cond2], [exp1, exp2], default=exp3)
df['three'] = (
np.select(
condlist=[df['one'] < 2, df['one'] < 4],
choicelist=[df['one'] * 10, df['one'] ** 2],
default=df['one'] + 10))
df
one two three
0 1 6 10
1 2 7 4
2 3 8 9
3 4 9 14
4 5 10 15
and/or (similar to the if/else)
Similar to if-else, requires the lambda:
df['three'] = df["one"].apply(
lambda x: (x < 2 and x * 10) or (x < 4 and x ** 2) or x + 10)
df
one two three
0 1 6 10
1 2 7 4
2 3 8 9
3 4 9 14
4 5 10 15
List Comprehension
Loopy solution that is still faster than apply.
df['three'] = [x*10 if x<2 else (x**2 if x<4 else x+10) for x in df['one']]
# df['three'] = [
# (x < 2 and x * 10) or (x < 4 and x ** 2) or x + 10) for x in df['one']
# ]
df
one two three
0 1 6 10
1 2 7 4
2 3 8 9
3 4 9 14
4 5 10 15
If you pass values that are less than 5 digits length you can do it like this:
df['ZipCode'] = df.ZipCode.apply(lambda x: x[-4:] if len(x) > 5 else x)
But you can do it more simple, no need to check length:
df['ZipCode'] = df.ZipCode.apply(lambda x: x[-4:])
All you want is to trim the strings to atmost 5 places. You can do df.ZipCode.str[:5]?
In [78]: df
Out[78]:
ZipCode
0 123456789
1 123
2 0
For, 5 places zipcodes
In [79]: df.ZipCode.str[:5]
Out[79]:
0 12345
1 123
2 0
Name: ZipCode, dtype: object
For, the extra (>5) characters part in zipcodes
In [80]: df.ZipCode.str[5:]
Out[80]:
0 6789
1
2
Name: ZipCode, dtype: object
is that what you want?
In [300]: frame[['b','c']].apply(lambda x: x['c'] if x['c']>0 else x['b'], axis=1)
Out[300]:
0 -1.099891
1 0.582815
2 0.901591
3 0.900856
dtype: float64
Solution
use a vectorized approach
frame['d'] = frame.b + (frame.c > 0) * (frame.c - frame.b)
Explanation
This is derived from the sum of
(frame.c > 0) * frame.c # frame.c if positive
Plus
(frame.c <= 0) * frame.b # frame.b if c is not positive
However
(frame.c <=0 )
is equivalent to
(1 - frame.c > 0)
and when combined you get
frame['d'] = frame.b + (frame.c > 0) * (frame.c - frame.b)