So looking at this question a couple years later I see the error, to coerce the returned result so it assigns correctly you need to access the scalar values and use these to assign so they align as desired:
In [22]:
df.loc[df['A'] == 1, ['A', 'B']] = df['C'].values[0] + 10,df['C'].values[0] + 11
df
Out[22]:
A B C
0 11 12 1
1 2 2 2
2 3 3 3
Answer from EdChum on Stack OverflowSo looking at this question a couple years later I see the error, to coerce the returned result so it assigns correctly you need to access the scalar values and use these to assign so they align as desired:
In [22]:
df.loc[df['A'] == 1, ['A', 'B']] = df['C'].values[0] + 10,df['C'].values[0] + 11
df
Out[22]:
A B C
0 11 12 1
1 2 2 2
2 3 3 3
I'm nor sure whether it's the best way to achieve that, but it works:
In [284]: df.loc[df['A'] == 1, ['A', 'B']] = pd.DataFrame({'A':df.C + 10, 'B':df.C + 11}, index=df.index)
In [285]: df
Out[285]:
A B C
0 11 12 1
1 2 2 2
2 3 3 3
data mining - Pandas change value of a column based another column condition - Data Science Stack Exchange
python - pandas : update value if condition in 3 columns are met - Stack Overflow
python - Pandas update multiple columns at once - Stack Overflow
python - Update row values where certain condition is met in pandas - Stack Overflow
What I want to achieve: Condition: where column2 == 2 leave to be 2 if column1 < 30 elsif change to 3 if column1 > 90
This can be simplified into where (column2 == 2 and column1 > 90) set column2 to 3. The column1 < 30 part is redundant, since the value of column2 is only going to change from 2 to 3 if column1 > 90.
In the code that you provide, you are using pandas function replace, which operates on the entire Series, as stated in the reference:
Values of the Series are replaced with other values dynamically. This differs from updating with .loc or .iloc, which require you to specify a location to update with some value.
This means that for each iteration of for x in filter1 your code performs global replacement, which is not what you want to do - you want to update the specific row of column2 that corresponds to x from column1 (which you are iterating over).
the problem is 2 does not change to 3 where column1 > 90
This is truly strange. I would expect the code you provided to have changed every instance of 2 in column2 to 3 as soon as it encountered an x >= 30, as dictated by your code conditional statement (the execution of the else branch). This discrepancy may stem from the fact that you are assigning to column2 the result of global replacement performed on the column Output (the contents of which are unknown). In any case, if you want your program to do something under a specific condition, such as x > 90, it should be explicitly stated in the code. You should also note that the statement data['column2'] = data['column2'].replace([2], [2]) achieves nothing, since 2 is being replaced with 2 and the same column is both the source and the destination.
What you could use to solve this particular task is a boolean mask (or the query method). Both are explained in an excellent manner in this question.
Using a boolean mask would be the easiest approach in your case:
mask = (data['column2'] == 2) & (data['column1'] > 90)
data['column2'][mask] = 3
The first line builds a Series of booleans (True/False) that indicate whether the supplied condition is satisfied.
The second line assigns the value 3 to those rows of column2 where the mask is True.
I've had success approaching this in a slightly different way.
import numpy as np
data['column2'] = np.where((data['column1'] < 30)
& (data['column2'] ==2), #Identifies the case to apply to
data['column2'], #This is the value that is inserted
data['column2']) #This is the column that is affected
data['column2'] = np.where((data['column1'] > 90)
& (data['column2'] ==2), #For rows with column1 > 90
data['column3'], #We place column3 values
data['column2']) #In column two
This is a little wordier than a loop, but I've found it to be the most intuitive way to do this sort of data manipulation with pandas.
Using:
df[ (df.A=='blue') & (df.B=='red') & (df.C=='square') ]['D'] = 'succeed'
gives the warning:
/usr/local/lib/python2.7/dist-packages/ipykernel_launcher.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
A better way of achieving this seems to be:
df.loc[(df['A'] == 'blue') & (df['B'] == 'red') & (df['C'] == 'square'),'D'] = 'M5'
You could try this instead:
df[ (df.A=='blue') & (df.B=='red') & (df.C=='square') ]['D'] = 'succeed'
you want to replace
print df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']]
Col1 Col2 Col3
2 NaN NaN NaN
3 NaN NaN NaN
With:
replace_with_this = df.loc[df['Col1'].isnull(),['col1_v2','col2_v2', 'col3_v2']]
print replace_with_this
col1_v2 col2_v2 col3_v2
2 a b d
3 d e f
Seems reasonable. However, when you do the assignment, you need to account for index alignment, which includes columns.
So, this should work:
df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']] = replace_with_this.values
print df
Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0 A B C NaN NaN NaN
1 D E F NaN NaN NaN
2 a b d a b d
3 d e f d e f
I accounted for columns by using .values at the end. This stripped the column information from the replace_with_this dataframe and just used the values in the appropriate positions.
In the "take the hill" spirit, I offer the below solution which yields the requested result.
I realize this is not exactly what you are after as I am not slicing the df (in the reasonable - but non functional - way in which you propose).
#Does not work when indexing on np.nan, so I fill with some arbitrary value.
df = df.fillna('AAA')
#mask to determine which rows to update
mask = df['Col1'] == 'AAA'
#dict with key value pairs for columns to be updated
mp = {'Col1':'col1_v2','Col2':'col2_v2','Col3':'col3_v2'}
#update
for k in mp:
df.loc[mask,k] = df[mp.get(k)]
#swap back np.nans for the arbitrary values
df = df.replace('AAA',np.nan)
Output:
Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
A B C NaN NaN NaN
D E F NaN NaN NaN
a b d a b d
d e f d e f
The error I get if I do not replace nans is below. I'm going to research exactly where that error stems from.
ValueError: array is not broadcastable to correct shape
I think you can use loc if you need update two columns to same value:
df1.loc[df1['stream'] == 2, ['feat','another_feat']] = 'aaaa'
print df1
stream feat another_feat
a 1 some_value some_value
b 2 aaaa aaaa
c 2 aaaa aaaa
d 3 some_value some_value
If you need update separate, one option is use:
df1.loc[df1['stream'] == 2, 'feat'] = 10
print df1
stream feat another_feat
a 1 some_value some_value
b 2 10 some_value
c 2 10 some_value
d 3 some_value some_value
Another common option is use numpy.where:
df1['feat'] = np.where(df1['stream'] == 2, 10,20)
print df1
stream feat another_feat
a 1 20 some_value
b 2 10 some_value
c 2 10 some_value
d 3 20 some_value
EDIT: If you need divide all columns without stream where condition is True, use:
print df1
stream feat another_feat
a 1 4 5
b 2 4 5
c 2 2 9
d 3 1 7
#filter columns all without stream
cols = [col for col in df1.columns if col != 'stream']
print cols
['feat', 'another_feat']
df1.loc[df1['stream'] == 2, cols ] = df1 / 2
print df1
stream feat another_feat
a 1 4.0 5.0
b 2 2.0 2.5
c 2 1.0 4.5
d 3 1.0 7.0
If working with multiple conditions is possible use multiple numpy.where
or numpy.select:
df0 = pd.DataFrame({'Col':[5,0,-6]})
df0['New Col1'] = np.where((df0['Col'] > 0), 'Increasing',
np.where((df0['Col'] < 0), 'Decreasing', 'No Change'))
df0['New Col2'] = np.select([df0['Col'] > 0, df0['Col'] < 0],
['Increasing', 'Decreasing'],
default='No Change')
print (df0)
Col New Col1 New Col2
0 5 Increasing Increasing
1 0 No Change No Change
2 -6 Decreasing Decreasing
You can do the same with .ix, like this:
In [1]: df = pd.DataFrame(np.random.randn(5,4), columns=list('abcd'))
In [2]: df
Out[2]:
a b c d
0 -0.323772 0.839542 0.173414 -1.341793
1 -1.001287 0.676910 0.465536 0.229544
2 0.963484 -0.905302 -0.435821 1.934512
3 0.266113 -0.034305 -0.110272 -0.720599
4 -0.522134 -0.913792 1.862832 0.314315
In [3]: df.ix[df.a>0, ['b','c']] = 0
In [4]: df
Out[4]:
a b c d
0 -0.323772 0.839542 0.173414 -1.341793
1 -1.001287 0.676910 0.465536 0.229544
2 0.963484 0.000000 0.000000 1.934512
3 0.266113 0.000000 0.000000 -0.720599
4 -0.522134 -0.913792 1.862832 0.314315
EDIT
After the extra information, the following will return all columns - where some condition is met - with halved values:
>> condition = df.a > 0
>> df[condition][[i for i in df.columns.values if i not in ['a']]].apply(lambda x: x/2)
You can use np.select for this:
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
'b': [45, 34, 556, 32, 97, 33],
'c': [234, 66, 12, 44, 99, 3],
'd': [123, 45, 55, 98, 17, 22] })
df['e'] = df.b + df.c + df.d
# list with your conditions
conditions = [(df.a == 'one') & (df.b < 50),
(df.a == 'two') & (df.d > 50)]
# list with accompanying choices
choices = [0,1]
df['f'] = np.select(conditions, choices, 2)
# 2 being the default: i.e. the 'else' choice.
df
a b c d e f
0 one 45 234 123 402 0
1 one 34 66 45 145 0
2 three 556 12 55 623 2
3 two 32 44 98 174 1
4 eleven 97 99 17 213 2
5 two 33 3 22 58 2
You can use nested np.where methods:
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'a': ['one', 'one', 'three', 'two', 'eleven', 'two'],
'b': [45, 34, 556, 32, 97, 33],
'c': [234, 66, 12, 44, 99, 3],
'd': [123, 45, 55, 98, 17, 22] })
df['e'] = df.b + df.c + df.d
df['f'] = np.where(
(df.a == 'one') & (df.b < 50),
0,
np.where(
(df.a == 'two') & (df.d > 50),
1,
2
)
)
Output:
a b c d e f
0 one 45 234 123 402 0
1 one 34 66 45 145 0
2 three 556 12 55 623 2
3 two 32 44 98 174 1
4 eleven 97 99 17 213 2
5 two 33 3 22 58 2
So, i've figured out how to use the pandas apply method to update/change the values of a column, row-wise based on multiple comparisons like this:
# for each row, if the value of both 'columns to check' are 'SOME STRING', change to 'NEW STRING # otherwise leave it as is my_df ['column_to_change'] = df.apply(lambda row: 'NEW STRING' if row['column_to_check_1'] and row['column_to_check_2'] == 'SOME STRING' else row['column_to_change'], axis=1)
Now, I can't figure out how to expand that beyond simple comparison operators. The specific example I'm trying to solve is:
" for each row, if the string value in COLUMN A contains 'foo', change the value in COLUMN B to 'bar', otherwise leave it as is"
I think this is all right, except the ##parts between the hashmarks##
my_df ['columb_b'] = df.apply(lambda row: 'bar' if ##column A contains 'foo'## else row['columb_b'], axis=1)
This will work
df.loc[(df.Color=='Blue')&(df.Age==28)&(df.City=='Atl'),'Value']=1
df
Out[687]:
Color Name Age City Value
0 Blue Bob 28 Atl 1
1 Green Bob 27 Chi 0
2 Blue Sam 28 Atl 1
For these problems, I usually default to np.select, so that I can create complex conditions, and set the outputs in a clear and expandable way.
First, create your conditions (Create as many of these as you want):
p1 = df.Color.eq('Blue')
p2 = df.Age.eq(28)
p3 = df.City.eq('Atl')
condition = p1 & p2 & p3
Now using numpy.select, passing a list of your conditions, a list of your matching outputs, and a default value:
df.assign(Value=np.select([condition], [1], df.Value))
Color Name Age City Value
0 Blue Bob 28 Atl 1
1 Green Bob 27 Chi 0
2 Blue Sam 28 Atl 1
If you really only have one condition, you can also use numpy.where here:
np.where(condition, 1, df.Value)
# array([1, 0, 1], dtype=int64)