apply works on a row / column basis of a DataFrame
applymap works element-wise on a DataFrame
map works element-wise on a Series
Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):
Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:
In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [117]: frame
Out[117]:
b d e
Utah -0.029638 1.081563 1.280300
Ohio 0.647747 0.831136 -1.549481
Texas 0.513416 -0.884417 0.195343
Oregon -0.485454 -0.477388 -0.309548
In [118]: f = lambda x: x.max() - x.min()
In [119]: frame.apply(f)
Out[119]:
b 1.133201
d 1.965980
e 2.829781
dtype: float64
Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.
Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:
In [120]: format = lambda x: '%.2f' % x
In [121]: frame.applymap(format)
Out[121]:
b d e
Utah -0.03 1.08 1.28
Ohio 0.65 0.83 -1.55
Texas 0.51 -0.88 0.20
Oregon -0.49 -0.48 -0.31
The reason for the name applymap is that Series has a map method for applying an element-wise function:
In [122]: frame['e'].map(format)
Out[122]:
Utah 1.28
Ohio -1.55
Texas 0.20
Oregon -0.31
Name: e, dtype: object
Answer from jeremiahbuddha on Stack Overflowapply works on a row / column basis of a DataFrame
applymap works element-wise on a DataFrame
map works element-wise on a Series
Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):
Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:
In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [117]: frame
Out[117]:
b d e
Utah -0.029638 1.081563 1.280300
Ohio 0.647747 0.831136 -1.549481
Texas 0.513416 -0.884417 0.195343
Oregon -0.485454 -0.477388 -0.309548
In [118]: f = lambda x: x.max() - x.min()
In [119]: frame.apply(f)
Out[119]:
b 1.133201
d 1.965980
e 2.829781
dtype: float64
Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.
Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:
In [120]: format = lambda x: '%.2f' % x
In [121]: frame.applymap(format)
Out[121]:
b d e
Utah -0.03 1.08 1.28
Ohio 0.65 0.83 -1.55
Texas 0.51 -0.88 0.20
Oregon -0.49 -0.48 -0.31
The reason for the name applymap is that Series has a map method for applying an element-wise function:
In [122]: frame['e'].map(format)
Out[122]:
Utah 1.28
Ohio -1.55
Texas 0.20
Oregon -0.31
Name: e, dtype: object
Comparing map, applymap and apply: Context Matters
The major differences are:
Definition
mapis defined on Series onlyapplymapis defined on DataFrames onlyapplyis defined on both
Input argument
mapacceptsdict,Series, or callableapplymapandapplyaccept callable only
Behavior
mapis elementwise for Seriesapplymapis elementwise for DataFramesapplyalso works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.
Use case (the most important difference)
mapis meant for mapping values from one domain to another, so is optimised for performance, e.g.,df['A'].map({1:'a', 2:'b', 3:'c'})applymapis good for elementwise transformations across multiple rows/columns, e.g.,df[['A', 'B', 'C']].applymap(str.strip)applyis for applying any function that cannot be vectorised, e.g.,df['sentences'].apply(nltk.sent_tokenize)
Also see When should I (not) want to use pandas apply() in my code? for a writeup I made a while back on the most appropriate scenarios for using apply. (Note that there aren't many, but there are a few— apply is generally slow.)
Summarising
map |
applymap |
apply |
|
|---|---|---|---|
| Defined on Series? | Yes | No | Yes |
| Defined on DataFrame? | No | Yes | Yes |
| Argument | dict, Series, or callable1 |
callable2 | callable |
| Elementwise? | Yes | Yes | Yes |
| Aggregation? | No | No | Yes |
| Use Case | Transformation/mapping3 | Transformation | More complex functions |
| Returns | Series |
DataFrame |
scalar, Series, or DataFrame4 |
Footnotes
mapwhen passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.applymapin more recent versions has been optimised for some operations. You will findapplymapslightly faster thanapplyin some cases. My suggestion is to test them both and use whatever works better.mapis optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.Series.applyreturns a scalar for aggregating operations,Seriesotherwise. Similarly forDataFrame.apply. Note thatapplyalso has fastpaths when called with certain NumPy functions such asmean,sum, etc.
Videos
Different use cases. When comparing them, it is useful to bring up apply and agg as well.
Setup
np.random.seed([3,1415])
df = pd.DataFrame(np.random.randint(10, size=(6, 4)), columns=list('ABCD'))
df
A B C D
0 0 2 7 3
1 8 7 0 6
2 8 6 0 2
3 0 4 9 7
4 3 2 4 3
5 3 6 7 7
pd.DataFrame.applymap
This takes a function and returns a new dataframe with the results of that function being applied to the value in each cell and replacing the value of the cell with the result.
df.applymap(lambda x: str(x) * x)
A B C D
0 22 7777777 333
1 88888888 7777777 666666
2 88888888 666666 22
3 4444 999999999 7777777
4 333 22 4444 333
5 333 666666 7777777 7777777
pd.DataFrame.agg
Takes one or more functions. Each function is expected to be an aggregation function. Meaning each function is applied to each column and is expected to return a single value that replaces the entire column. Examples would be 'mean' or 'max'. Both of those take a set of data and return a scalar.
df.agg('mean')
A 3.666667
B 4.500000
C 4.500000
D 4.666667
dtype: float64
Or
df.agg(['mean', 'std', 'first', 'min'])
A B C D
mean 3.666667 4.500000 4.500000 4.666667
std 3.614784 2.167948 3.834058 2.250926
min 0.000000 2.000000 0.000000 2.000000
pd.DataFrame.transform
Takes one function that is expected to be applied to a column and return a column of equal size.
df.transform(lambda x: x / x.std())
A B C D
0 0.000000 0.922531 1.825742 1.332785
1 2.213133 3.228859 0.000000 2.665570
2 2.213133 2.767594 0.000000 0.888523
3 0.000000 1.845062 2.347382 3.109832
4 0.829925 0.922531 1.043281 1.332785
5 0.829925 2.767594 1.825742 3.109832
pd.DataFrame.apply
pandas attempts to figure out if apply is reducing the dimensionality of the column it was operating on (aka, aggregation) or if it is transforming the column into another column of equal size. When it figures it out, it runs the remainder of the operation as if it were an aggregation or transform procedure.
df.apply('mean')
A 3.666667
B 4.500000
C 4.500000
D 4.666667
dtype: float64
Or
df.apply(lambda x: (x - x.mean()) / x.std())
A B C D
0 -1.014353 -1.153164 0.652051 -0.740436
1 1.198781 1.153164 -1.173691 0.592349
2 1.198781 0.691898 -1.173691 -1.184698
3 -1.014353 -0.230633 1.173691 1.036611
4 -0.184428 -1.153164 -0.130410 -0.740436
5 -0.184428 0.691898 0.652051 1.036611
What's meant by
.transform()returns a like-indexed DataFrame stated in the documentation?
That means .transform() applies a function to every value (or a group, once preceded by groupby) in the DataFrame and returns another DataFrame with the same length as the input, so to emphasize: it keeps the input index labels in the output.
Is there any use case where one of applymap/transform and the other doesn't?
Sure. Here are some examples:
1) applymap Vs transform
Since applymap performs on all elements of a DataFrame, you can't perform applymap on a Series:
df['Quantity'].transform(lambda x: x+10) # successful
df['Quantity'].apply(lambda x: x+10) # successful
df['Quantity'].applymap(lambda x: x+10) # gives AttributeError: 'Series' object has no attribute 'applymap'
# unless you cast it to DataFrame:
pd.DataFrame(df['Quantity']).applymap(lambda x: x+10) # successful
Another important difference is that despite .applymap() which operates element-wise, .transform() can perform group-wise operations, referred to in the next part.
Moreover, applymap cannot be preceded by groupby.
2) apply Vs transform
apply and transform can be interchangeable as long as you perform them on DataFrame column(s). Here is a simple example:
# imagine the following DataFrame
df = pd.DataFrame({'Label': ['A', 'B', 'C', 'A', 'C'],
'Values': [0,1,2,3,4],
'Quantity': [5,6,7,8,9]}, index = list('VWXYZ'))
Label Quantity Values
---------------------------------
V A 5 0
W B 6 1
X C 7 2
Y A 8 3
Z C 9 4
df.loc[:, ['Quantity', 'Values']].apply(lambda x: x+10)
df.loc[:, ['Quantity', 'Values']].transform(lambda x: x+10)
# both of them give the following same result:
Quantity Values
-------------------------
V 15 10
W 16 11
X 17 12
Y 18 13
Z 19 14
The main difference emerges once they follow a groupby operation. For instance:
label_grouping = df.groupby('Label')
label_grouping.apply(lambda x: x.mean())
# output:
Quantity Values
Label
-----------------------
A 6.5 1.5
B 6.0 1.0
C 8.0 3.0
label_grouping.transform(lambda x: x.mean())
# see how `transform` could manage to keeps the input index labels in the output
# output:
Quantity Values
------------------------
V 6.5 1.5
W 6.0 1.0
X 8.0 3.0
Y 6.5 1.5
Z 8.0 3.0
The above example clearly shows how transform can retain input DataFrame indexes; So to get the better of this exclusive feature, the following short example tries to clarify how to benefit from this alignment of the indexes between the input and output of the transform operation, by calculating the percentage of order total that each product represents:
df_sales = pd.DataFrame({'OrderID': [1001,1001,1001,1002,1002],
'Product': ['p1','p2','p3','p1','p4'],
'Quantity': [30,20,70,160,40]})
OrderID Product Quantity
-----------------------------------
0 1001 p1 30
1 1001 p2 20
2 1001 p3 70
3 1002 p1 160
4 1002 p4 40
df_sales['total_per_order'] = df_sales.groupby(['OrderID'])['Quantity'].transform(lambda x: x.sum())
df_sales['pct_of_order'] = df_sales['Quantity'] / df_sales['total_per_order']
OrderID Product Quantity total_per_order pct_of_order
----------------------------------------------------------------------
0 1001 p1 30 120 0.250000
1 1001 p2 20 120 0.166667
2 1001 p3 70 120 0.583333
3 1002 p1 160 200 0.800000
4 1002 p4 40 200 0.200000
It's highly advised to follow this link for a more detailed example: https://pbpython.com/pandas_transform.html
Many aggregation functions are built in directly to the groupby object to save you some typing. specifically some of the common ones to benefit from are (prefixed by gb):
- gb.apply
- gb.transform
- gb.filter
- gb.agg
- gb.count
- gb.comsum
- gb.fillna
- ...
Hope this helped :)