If you have to use the "apply" variant, the code should be:
df['product_AH'] = df.apply(lambda row: row.Age * row.Height, axis=1)
The parameter to the function applied is the whole row.
But much quicker solution is:
df['product_AH'] = df.Age * df.Height
(1.43 ms, compared to 5.08 ms for the "apply" variant).
This way computation is performed using vectorization, whereas apply refers to each row separately, applies the function to it, then assembles all results and saves them in the target column, which is considerably slower.
Answer from Valdi_Bo on Stack OverflowVideos
df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B']) df.apply(lambda x: 1, axis=1)
This works just fine, it returns a dataframe with three rows where each row is 1
However this line
df['A'].apply(lambda x: 1, axis=1)
gives the error <lambda>() got an unexpected keyword argument 'axis'. Shouldn't the behavior be the same?
It seems like in df.apply(lambda x: 1, axis=1) axis=1 is parsed as an argument to the pandas apply() method, but in df['A'].apply(lambda x: 1, axis=1) the axis=1 is getting parsed as being part of the lambda. Why is the parsing different just because I'm indexing a column?
I've been using python and Pandas at work for a couple of months, now, and I just realized that using df[df['Series'].apply(lambda x: [conditions]) is becoming my go-to solution for more complex filters. I just find the syntax simple to use and understand.
My question is, are there any downsides to this? I mean, I'm aware that using a lambda function for something when there may already be a method for what I want is reinventing the wheel, but I'm new to python and still learning all the methods, so I'm mostly thinking on how might affect things performance and readability-wise or if it's more of a "if it works, it works" situation.
Alternatively, you can use loc:
import pandas as pd
df = pd.DataFrame({"age": [-100, 300, 400, 500, 600, 700]})
df["age"].loc[(df["age"] < 500) & (df["age"] >= 0)] = 0
Now your df looks like this:
age
0 -100
1 0
2 0
3 500
4 600
5 700
You can use Nested List comprehension within the lambda function.
Or
Write a function and call the function on your series using Lambda