Nest if .. elses:
lambda x: x*10 if x<2 else (x**2 if x<4 else x+10)
Answer from Uriel on Stack OverflowNest if .. elses:
lambda x: x*10 if x<2 else (x**2 if x<4 else x+10)
I do not recommend the use of apply here: it should be avoided if there are better alternatives.
For example, if you are performing the following operation on a Series:
if cond1:
exp1
elif cond2:
exp2
else:
exp3
This is usually a good use case for np.where or np.select.
numpy.where
The if else chain above can be written using
np.where(cond1, exp1, np.where(cond2, exp2, ...))
np.where allows nesting. With one level of nesting, your problem can be solved with,
df['three'] = (
np.where(
df['one'] < 2,
df['one'] * 10,
np.where(df['one'] < 4, df['one'] ** 2, df['one'] + 10))
df
one two three
0 1 6 10
1 2 7 4
2 3 8 9
3 4 9 14
4 5 10 15
numpy.select
Allows for flexible syntax and is easily extensible. It follows the form,
np.select([cond1, cond2, ...], [exp1, exp2, ...])
Or, in this case,
np.select([cond1, cond2], [exp1, exp2], default=exp3)
df['three'] = (
np.select(
condlist=[df['one'] < 2, df['one'] < 4],
choicelist=[df['one'] * 10, df['one'] ** 2],
default=df['one'] + 10))
df
one two three
0 1 6 10
1 2 7 4
2 3 8 9
3 4 9 14
4 5 10 15
and/or (similar to the if/else)
Similar to if-else, requires the lambda:
df['three'] = df["one"].apply(
lambda x: (x < 2 and x * 10) or (x < 4 and x ** 2) or x + 10)
df
one two three
0 1 6 10
1 2 7 4
2 3 8 9
3 4 9 14
4 5 10 15
List Comprehension
Loopy solution that is still faster than apply.
df['three'] = [x*10 if x<2 else (x**2 if x<4 else x+10) for x in df['one']]
# df['three'] = [
# (x < 2 and x * 10) or (x < 4 and x ** 2) or x + 10) for x in df['one']
# ]
df
one two three
0 1 6 10
1 2 7 4
2 3 8 9
3 4 9 14
4 5 10 15
python - Using lambda if condition on different columns in Pandas dataframe - Stack Overflow
Implementing if-else in python dataframe using lambda when there are multiple variables - Stack Overflow
pandas - Lambda function with if else clause with Python - Stack Overflow
python - lambda row function with if else statement - Stack Overflow
What is the best way to apply a lambda function with if-else logic?
Can I use multiple conditions in a lambda function?
Is list comprehension faster than using apply?
In [1]: df
Out[1]:
data
0 1
1 2
2 3
3 4
You want to apply a function that conditionally returns a value based on the selected dataframe column.
In [2]: df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
Out[2]:
0 true
1 true
2 false
3 false
Name: data
You can then assign that returned column to a new column in your dataframe:
In [3]: df['desired_output'] = df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
In [4]: df
Out[4]:
data desired_output
0 1 true
1 2 true
2 3 false
3 4 false
Just compare the column with that value:
In [9]: df = pandas.DataFrame([1,2,3,4], columns=["data"])
In [10]: df
Out[10]:
data
0 1
1 2
2 3
3 4
In [11]: df["desired"] = df["data"] > 2.5
In [11]: df
Out[12]:
data desired
0 1 False
1 2 False
2 3 True
3 4 True
is that what you want?
In [300]: frame[['b','c']].apply(lambda x: x['c'] if x['c']>0 else x['b'], axis=1)
Out[300]:
0 -1.099891
1 0.582815
2 0.901591
3 0.900856
dtype: float64
Solution
use a vectorized approach
frame['d'] = frame.b + (frame.c > 0) * (frame.c - frame.b)
Explanation
This is derived from the sum of
(frame.c > 0) * frame.c # frame.c if positive
Plus
(frame.c <= 0) * frame.b # frame.b if c is not positive
However
(frame.c <=0 )
is equivalent to
(1 - frame.c > 0)
and when combined you get
frame['d'] = frame.b + (frame.c > 0) * (frame.c - frame.b)
Apply across columns
Use pd.DataFrame.apply instead of pd.Series.apply and specify axis=1:
df['one'] = df.apply(lambda row: row['one']*100 if row['two']>8 else \
(row['one']*1 if row['two']<8 else row['one']**2), axis=1)
Unreadable? Yes, I agree. Let's try again but this time rewrite as a named function.
Using a function
Note lambda is just an anonymous function. We can define a function explicitly and use it with pd.DataFrame.apply:
def calc(row):
if row['two'] > 8:
return row['one'] * 100
elif row['two'] < 8:
return row['one']
else:
return row['one']**2
df['one'] = df.apply(calc, axis=1)
Readable? Yes. But this isn't vectorised. We're looping through each row one at at at time. We might as well have used a list. Pandas isn't just for clever table formatting, you can use it for vectorised calculations using arrays in contiguous memory blocks. So let's try one more time.
Vectorised calculations
Using numpy.where:
df['one'] = np.where(row['two'] > 8, row['one'] * 100,
np.where(row['two'] < 8, row['one'],
row['one']**2))
There we go. Readable and efficient. We have effectively vectorised our if / else statements. Does this mean that we are doing more calculations than necessary? Yes! But this is more than offset by the way in which we are performing the calculations, i.e. with well-defined blocks of memory rather than pointers. You will find an order of magnitude performance improvement.
Another example
Well, we can just use numpy.where again.
df['one'] = np.where(df['name'].isin(['a', 'b']), 100, df['two'])
you can do
df.apply(lambda x: x["one"] + x["two"], axis=1)
but i don't think that such a long lambda as lambda x: x["one"]*100 if x["two"]>8 else (x["one"]*1 if x["two"]<8 else x["one"]**2) is very pythonic. apply takes any callback:
def my_callback(x):
if x["two"] > 8:
return x["one"]*100
elif x["two"] < 8:
return x["one"]
else:
return x["one"]**2
df.apply(my_callback, axis=1)
The easiest way to do this is:
df["pk_day"] = df["Date"].dt.weekday.lt(5)
Now, for why your second statement does not work. You're using:
df[(df['Date'].dt.strftime('%a')=='Sat')|(df['Date'].dt.strftime('%a')=='Sun')]
This returns the rows in the DataFrame where the day is a weekend. Hence, it is not boolean. You could use:
lambda pkday:False if ((df['Date'].dt.strftime('%a')=='Sat')|(df['Date'].dt.strftime('%a')=='Sun')) else True
Your last statement works without any errors.
Keep it simple, the following will do the job.
pk_day = lambda pkday : df['Date'].dt.strftime('%a') in ('Sat','Sun')
This issue was that you were first filling the NaN and then using .str.split(), so the equality should be with a list, not the element of the list. You can see this by first checking what x is in your lambda function.
dfs['freq'].str.split(',')
#0 [text1]
#1 [text1, text2, text1]
#2 [text1, text2, text3]
#3 [text1]
#4 [text1, text2, text3, text4, text5]
#5 [no_guide]
#6 [text1, text2, text3, text4, text5, text6]
The correct equality to check is whether x is a list whose only element is 'no_guide':
lambda x: 0 if x == ['no_guide'] else len(set(x))
Since len(set(x)) returns a number, you may also want to return 0 and not the string '0'.
You could use this:
df['freq'].fillna('no_guide', inplace=True)
df['counts'] = df['freq'].str.split(',', expand=True)\
.apply(lambda x: x.str.contains('text')).sum(1)
df
Output:
guide freq counts
0 g1 text1 1.0
1 g2 text1,text2,text1 3.0
2 g3 text1,text2,text3 3.0
3 g4 text1 1.0
4 g5 text1,text2,text3,text4,text5 5.0
5 g6 no_guide 0.0
6 g7 text1,text2,text3,text4,text5,text6 6.0
Hello everyone,
I can't figure this one out, unfortunately: How do I use an if-else statement in the assign function from pandas to create a new column called 'id'? I know that I could do it alternatively with np.where at the beginning, like in the commented out line, but for future applications, I would like to know how to do it properly in the pipe chain. I tried a lambda function, but it throws the following error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Help to solve this is much appreciated!
(data
# .assign(country_indicator = np.where(data.country == "Switzerland", "CH", "EU"))
.query("continent == 'Europe'")
.reset_index(drop=True)
.assign(id = lambda df: "CH" if df.country == "Switzerland" else "EU")
)The data is the gapminder dataset and the relevant columns look like this:
index | country | continent 12 Albania Europe 13 Albania Europe 14 Albania Europe 15 Albania Europe 16 Albania Europe ... ... ... 1603 United Kingdom Europe 1604 United Kingdom Europe 1605 United Kingdom Europe 1606 United Kingdom Europe 1607 United Kingdom Europe