apply works on a row / column basis of a DataFrame
applymap works element-wise on a DataFrame
map works element-wise on a Series
Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):
Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:
In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [117]: frame
Out[117]:
b d e
Utah -0.029638 1.081563 1.280300
Ohio 0.647747 0.831136 -1.549481
Texas 0.513416 -0.884417 0.195343
Oregon -0.485454 -0.477388 -0.309548
In [118]: f = lambda x: x.max() - x.min()
In [119]: frame.apply(f)
Out[119]:
b 1.133201
d 1.965980
e 2.829781
dtype: float64
Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.
Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:
In [120]: format = lambda x: '%.2f' % x
In [121]: frame.applymap(format)
Out[121]:
b d e
Utah -0.03 1.08 1.28
Ohio 0.65 0.83 -1.55
Texas 0.51 -0.88 0.20
Oregon -0.49 -0.48 -0.31
The reason for the name applymap is that Series has a map method for applying an element-wise function:
In [122]: frame['e'].map(format)
Out[122]:
Utah 1.28
Ohio -1.55
Texas 0.20
Oregon -0.31
Name: e, dtype: object
Answer from jeremiahbuddha on Stack Overflowapply works on a row / column basis of a DataFrame
applymap works element-wise on a DataFrame
map works element-wise on a Series
Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):
Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:
In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [117]: frame
Out[117]:
b d e
Utah -0.029638 1.081563 1.280300
Ohio 0.647747 0.831136 -1.549481
Texas 0.513416 -0.884417 0.195343
Oregon -0.485454 -0.477388 -0.309548
In [118]: f = lambda x: x.max() - x.min()
In [119]: frame.apply(f)
Out[119]:
b 1.133201
d 1.965980
e 2.829781
dtype: float64
Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.
Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:
In [120]: format = lambda x: '%.2f' % x
In [121]: frame.applymap(format)
Out[121]:
b d e
Utah -0.03 1.08 1.28
Ohio 0.65 0.83 -1.55
Texas 0.51 -0.88 0.20
Oregon -0.49 -0.48 -0.31
The reason for the name applymap is that Series has a map method for applying an element-wise function:
In [122]: frame['e'].map(format)
Out[122]:
Utah 1.28
Ohio -1.55
Texas 0.20
Oregon -0.31
Name: e, dtype: object
Comparing map, applymap and apply: Context Matters
The major differences are:
Definition
mapis defined on Series onlyapplymapis defined on DataFrames onlyapplyis defined on both
Input argument
mapacceptsdict,Series, or callableapplymapandapplyaccept callable only
Behavior
mapis elementwise for Seriesapplymapis elementwise for DataFramesapplyalso works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.
Use case (the most important difference)
mapis meant for mapping values from one domain to another, so is optimised for performance, e.g.,df['A'].map({1:'a', 2:'b', 3:'c'})applymapis good for elementwise transformations across multiple rows/columns, e.g.,df[['A', 'B', 'C']].applymap(str.strip)applyis for applying any function that cannot be vectorised, e.g.,df['sentences'].apply(nltk.sent_tokenize)
Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply. (Note that there aren't many, but there are a few— apply is generally slow.)
Summarising
map |
applymap |
apply |
|
|---|---|---|---|
| Defined on Series? | Yes | No | Yes |
| Defined on DataFrame? | No | Yes | Yes |
| Argument | dict, Series, or callable1 |
callable2 | callable |
| Elementwise? | Yes | Yes | Yes |
| Aggregation? | No | No | Yes |
| Use Case | Transformation/mapping3 | Transformation | More complex functions |
| Returns | Series |
DataFrame |
scalar, Series, or DataFrame4 |
Footnotes
mapwhen passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.applymapin more recent versions has been optimised for some operations. You will findapplymapslightly faster thanapplyin some cases. My suggestion is to test them both and use whatever works better.mapis optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.Series.applyreturns a scalar for aggregating operations,Seriesotherwise. Similarly forDataFrame.apply. Note thatapplyalso has fastpaths when called with certain NumPy functions such asmean,sum, etc.
Videos
Another solutions are use DataFrame.any for get at least one True per row:
print (df[['h1', 'h5']].apply(lambda x: x.str.contains('A')))
h1 h5
0 True False
1 False False
2 False True
print (df[['h1', 'h5']].apply(lambda x: x.str.contains('A')).any(1))
0 True
1 False
2 True
dtype: bool
df['new'] = np.where(df[['h1','h5']].apply(lambda x: x.str.contains('A')).any(1),
'Label', '')
print (df)
h1 h2 h3 h4 h5 new
0 A B C D Z Label
1 E A G H Y
2 I J K L A Label
mask = df[['h1', 'h5']].apply(lambda x: x.str.contains('A')).any(1)
df.loc[mask, 'New'] = 'Label'
print (df)
h1 h2 h3 h4 h5 New
0 A B C D Z Label
1 E A G H Y NaN
2 I J K L A Label
pd.DataFrame.apply iterates over each column, passing the column as a pd.Series to the function being applied. In you case, the function you're trying to apply doesn't lend itself to being used in apply
Do this instead to get your idea to work
mask = df[['h1', 'h5']].apply(lambda x: x.str.contains('A').any(), 1)
df.loc[mask, 'New Column'] = 'Label'
h1 h2 h3 h4 h5 New Column
0 A B C D Z Label
1 E A G H Y NaN
2 I J K L A Label