So, i've figured out how to use the pandas apply method to update/change the values of a column, row-wise based on multiple comparisons like this:
# for each row, if the value of both 'columns to check' are 'SOME STRING', change to 'NEW STRING # otherwise leave it as is my_df ['column_to_change'] = df.apply(lambda row: 'NEW STRING' if row['column_to_check_1'] and row['column_to_check_2'] == 'SOME STRING' else row['column_to_change'], axis=1)
Now, I can't figure out how to expand that beyond simple comparison operators. The specific example I'm trying to solve is:
" for each row, if the string value in COLUMN A contains 'foo', change the value in COLUMN B to 'bar', otherwise leave it as is"
I think this is all right, except the ##parts between the hashmarks##
my_df ['columb_b'] = df.apply(lambda row: 'bar' if ##column A contains 'foo'## else row['columb_b'], axis=1)
use the apply method
In [80]: x = {'Value': ['Test', 'XXX123', 'XXX456', 'Test']}
In [81]: df = pd.DataFrame(x)
In [82]: df.Value.apply(lambda x: np.nan if x.startswith('XXX') else x)
Out[82]:
0 Test
1 NaN
2 NaN
3 Test
Name: Value, dtype: object
Performance Comparision of apply, where, loc

np.where() performs way better here:
df.Value=np.where(df.Value.str.startswith('XXX'),np.nan,df.Value)
Performance vs apply on larger dfs:

python - Using Lambda Function Pandas to Set Column Values - Stack Overflow
python - Update a pandas data frame column using Apply,Lambda and Group by Functions - Data Science Stack Exchange
pandas - use .apply() function to change values to a column of the dataframe - Data Science Stack Exchange
python - Pandas change column value based on other column with lambda function - Stack Overflow
Given you dataframe is data, use the below apply() function:
For column with list of words separated by space:
data['New_instructions'] = data['instructions'].apply(lambda x: [i.split()[0].strip()for i in x])
For column with single list word:
data['New_instructions'] = data['instructions'].apply(lambda x: x.split()[0].strip())
use lambda function like as follows
dataFrame['opcodes'] = dataFrame['instructions'].apply(lambda x:[i.split()[0] for i in x])
As @DSM points out, you can do this more directly using the vectorised string methods:
df['Date'].str[-4:].astype(int)
Or using extract (assuming there is only one set of digits of length 4 somewhere in each string):
df['Date'].str.extract('(?P<year>\d{4})').astype(int)
An alternative slightly more flexible way, might be to use apply (or equivalently map) to do this:
df['Date'] = df['Date'].apply(lambda x: int(str(x)[-4:]))
# converts the last 4 characters of the string to an integer
The lambda function, is taking the input from the Date and converting it to a year.
You could (and perhaps should) write this more verbosely as:
def convert_to_year(date_in_some_format):
date_as_string = str(date_in_some_format) # cast to string
year_as_string = date_in_some_format[-4:] # last four characters
return int(year_as_string)
df['Date'] = df['Date'].apply(convert_to_year)
Perhaps 'Year' is a better name for this column...
You can do a column transformation by using apply
Define a clean function to remove the dollar and commas and convert your data to float.
def clean(x):
x = x.replace("$", "").replace(",", "").replace(" ", "")
return float(x)
Next, call it on your column like this.
data['Revenue'] = data['Revenue'].apply(clean)
You have to add the = operator. DataFrames are not mutable like lists, therefore you have to store the value in the column: Energy['Energy Supply'] = ....
Energy['Energy Supply'] = Energy['Energy Supply'].apply(lambda x: x*(10**6))
Energy.head()
You don't need to use apply, just use compound assignment operators:
Energy['Energy Supply'] *= 1e6
Given a sample dataframe df as:
a b
0 1 2
1 2 3
2 3 4
3 4 5
what you want is:
df['a'] = df['a'].apply(lambda x: x + 1)
that returns:
a b
0 2 2
1 3 3
2 4 4
3 5 5
For a single column better to use map(), like this:
df = pd.DataFrame([{'a': 15, 'b': 15, 'c': 5}, {'a': 20, 'b': 10, 'c': 7}, {'a': 25, 'b': 30, 'c': 9}])
a b c
0 15 15 5
1 20 10 7
2 25 30 9
df['a'] = df['a'].map(lambda a: a / 2.)
a b c
0 7.5 15 5
1 10.0 10 7
2 12.5 30 9