Videos
apply works on a row / column basis of a DataFrame
applymap works element-wise on a DataFrame
map works element-wise on a Series
Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):
Another frequent operation is applying a function on 1D arrays to each column or row. DataFrameโs apply method does exactly this:
In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [117]: frame
Out[117]:
b d e
Utah -0.029638 1.081563 1.280300
Ohio 0.647747 0.831136 -1.549481
Texas 0.513416 -0.884417 0.195343
Oregon -0.485454 -0.477388 -0.309548
In [118]: f = lambda x: x.max() - x.min()
In [119]: frame.apply(f)
Out[119]:
b 1.133201
d 1.965980
e 2.829781
dtype: float64
Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.
Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:
In [120]: format = lambda x: '%.2f' % x
In [121]: frame.applymap(format)
Out[121]:
b d e
Utah -0.03 1.08 1.28
Ohio 0.65 0.83 -1.55
Texas 0.51 -0.88 0.20
Oregon -0.49 -0.48 -0.31
The reason for the name applymap is that Series has a map method for applying an element-wise function:
In [122]: frame['e'].map(format)
Out[122]:
Utah 1.28
Ohio -1.55
Texas 0.20
Oregon -0.31
Name: e, dtype: object
Comparing map, applymap and apply: Context Matters
The major differences are:
Definition
mapis defined on Series onlyapplymapis defined on DataFrames onlyapplyis defined on both
Input argument
mapacceptsdict,Series, or callableapplymapandapplyaccept callable only
Behavior
mapis elementwise for Seriesapplymapis elementwise for DataFramesapplyalso works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.
Use case (the most important difference)
mapis meant for mapping values from one domain to another, so is optimised for performance, e.g.,df['A'].map({1:'a', 2:'b', 3:'c'})applymapis good for elementwise transformations across multiple rows/columns, e.g.,df[['A', 'B', 'C']].applymap(str.strip)applyis for applying any function that cannot be vectorised, e.g.,df['sentences'].apply(nltk.sent_tokenize)
Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply. (Note that there aren't many, but there are a fewโ apply is generally slow.)
Summarising
map |
applymap |
apply |
|
|---|---|---|---|
| Defined on Series? | Yes | No | Yes |
| Defined on DataFrame? | No | Yes | Yes |
| Argument | dict, Series, or callable1 |
callable2 | callable |
| Elementwise? | Yes | Yes | Yes |
| Aggregation? | No | No | Yes |
| Use Case | Transformation/mapping3 | Transformation | More complex functions |
| Returns | Series |
DataFrame |
scalar, Series, or DataFrame4 |
Footnotes
mapwhen passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.applymapin more recent versions has been optimised for some operations. You will findapplymapslightly faster thanapplyin some cases. My suggestion is to test them both and use whatever works better.mapis optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.Series.applyreturns a scalar for aggregating operations,Seriesotherwise. Similarly forDataFrame.apply. Note thatapplyalso has fastpaths when called with certain NumPy functions such asmean,sum, etc.
I've been doing a Kaggle course about pandas and found a line I don't really understand, so I was hoping someone could help me out a bit.
The line would be this:
n_trop = reviews.description.map(lambda desc: "tropical" in desc).sum()
It wants to count the number of times 'tropical' appears in the description column of a table.
What does 'desc' stand for? Is it description? In that case, can I shorten column names in pandas whenever I feel like it?
I believe I may have a problem with lambdas but I'm quite lost here.