If you only have two choices to select from then use np.where:
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
For example,
import pandas as pd
import numpy as np
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
print(df)
yields
Set Type color
0 Z A green
1 Z B green
2 X B red
3 Y C red
If you have more than two conditions then use np.select. For example, if you want color to be
yellowwhen(df['Set'] == 'Z') & (df['Type'] == 'A')- otherwise
bluewhen(df['Set'] == 'Z') & (df['Type'] == 'B') - otherwise
purplewhen(df['Type'] == 'B') - otherwise
black,
then use
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
conditions = [
(df['Set'] == 'Z') & (df['Type'] == 'A'),
(df['Set'] == 'Z') & (df['Type'] == 'B'),
(df['Type'] == 'B')]
choices = ['yellow', 'blue', 'purple']
df['color'] = np.select(conditions, choices, default='black')
print(df)
which yields
Set Type color
0 Z A yellow
1 Z B blue
2 X B purple
3 Y C black
Answer from unutbu on Stack OverflowVideos
If you only have two choices to select from then use np.where:
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
For example,
import pandas as pd
import numpy as np
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
print(df)
yields
Set Type color
0 Z A green
1 Z B green
2 X B red
3 Y C red
If you have more than two conditions then use np.select. For example, if you want color to be
yellowwhen(df['Set'] == 'Z') & (df['Type'] == 'A')- otherwise
bluewhen(df['Set'] == 'Z') & (df['Type'] == 'B') - otherwise
purplewhen(df['Type'] == 'B') - otherwise
black,
then use
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
conditions = [
(df['Set'] == 'Z') & (df['Type'] == 'A'),
(df['Set'] == 'Z') & (df['Type'] == 'B'),
(df['Type'] == 'B')]
choices = ['yellow', 'blue', 'purple']
df['color'] = np.select(conditions, choices, default='black')
print(df)
which yields
Set Type color
0 Z A yellow
1 Z B blue
2 X B purple
3 Y C black
List comprehension is another way to create another column conditionally. If you are working with object dtypes in columns, like in your example, list comprehensions typically outperform most other methods.
Example list comprehension:
df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]
%timeit tests:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
%timeit df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]
%timeit df['color'] = np.where(df['Set']=='Z', 'green', 'red')
%timeit df['color'] = df.Set.map( lambda x: 'red' if x == 'Z' else 'green')
1000 loops, best of 3: 239 µs per loop
1000 loops, best of 3: 523 µs per loop
1000 loops, best of 3: 263 µs per loop
You can use numpy.where:
def my_fun (var1,var2,var3):
df[var3]= np.where((df[var1]-df[var2])>0, df[var1]-df[var2], 0)
return df
df1 = my_fun('age1','age2','diff')
print (df1)
age1 age2 diff
0 23 10 13
1 45 20 25
2 21 50 0
Error is better explain here.
Slowier solution with apply, where need axis=1 for data processing by rows:
def my_fun(x, var1, var2, var3):
print (x)
if (x[var1]-x[var2])>0 :
x[var3]=x[var1]-x[var2]
else:
x[var3]=0
return x
print (df.apply(lambda x: my_fun(x, 'age1', 'age2','diff'), axis=1))
age1 age2 diff
0 23 10 13
1 45 20 25
2 21 50 0
Also is possible use loc, but sometimes data can be overwritten:
def my_fun(x, var1, var2, var3):
print (x)
mask = (x[var1]-x[var2])>0
x.loc[mask, var3] = x[var1]-x[var2]
x.loc[~mask, var3] = 0
return x
print (my_fun(df, 'age1', 'age2','diff'))
age1 age2 diff
0 23 10 13.0
1 45 20 25.0
2 21 50 0.0
You can use pandas.Series.where
df.assign(age3=(df.age1 - df.age2).where(df.age1 > df.age2, 0))
age1 age2 age3
0 23 10 13
1 45 20 25
2 21 50 0
You can wrap this in a function
def my_fun(v1, v2):
return v1.sub(v2).where(v1 > v2, 0)
df.assign(age3=my_fun(df.age1, df.age2))
age1 age2 age3
0 23 10 13
1 45 20 25
2 21 50 0
A good solution has been provided by @Phoenix based on what you have tried so far. But in the case of an if-else condition, you can use numpy.where function too:
import numpy as np
data["Status"] = np.where(data["sex"] == "female", "f", "m")
The mistake is in the assignment data['Status'] = 'm'. You set all values of this column to m. To correct this and follow your approach, you can iterate through the column using:
for index in range(data.shape[0]):
if data.loc[index,'sex'] == 'male':
data.loc[index,'Status'] = 'm'
else:
data.loc[index,'Status'] = 'f'
There is another efficient solution using map:
val_dict= {'female': 'f', 'male': 'm'}
df['status'] = df['sex'].map(val_dict)