Came across this page while looking for an answer to this problem, but didn't like the existing answers. I ended up finding something better in the DataFrame.fillna documentation, and figured I'd contribute for anyone else that happens upon this.
If you have multiple columns, but only want to replace the NaN in a subset of them, you can use:
df.fillna({'Name':'.', 'City':'.'}, inplace=True)
This also allows you to specify different replacements for each column. And if you want to go ahead and fill all remaining NaN values, you can just throw another fillna on the end:
df.fillna({'Name':'.', 'City':'.'}, inplace=True).fillna(0, inplace=True)
Edit (22 Apr 2021)
Functionality (presumably / apparently) changed since original post, and you can no longer chain 2 inplace fillna() operations. You can still chain, but now must assign that chain to the df instead of modifying in place, e.g. like so:
df = df.fillna({'Name':'.', 'City':'.'}).fillna(0)
Answer from Rob Bulmahn on Stack OverflowCame across this page while looking for an answer to this problem, but didn't like the existing answers. I ended up finding something better in the DataFrame.fillna documentation, and figured I'd contribute for anyone else that happens upon this.
If you have multiple columns, but only want to replace the NaN in a subset of them, you can use:
df.fillna({'Name':'.', 'City':'.'}, inplace=True)
This also allows you to specify different replacements for each column. And if you want to go ahead and fill all remaining NaN values, you can just throw another fillna on the end:
df.fillna({'Name':'.', 'City':'.'}, inplace=True).fillna(0, inplace=True)
Edit (22 Apr 2021)
Functionality (presumably / apparently) changed since original post, and you can no longer chain 2 inplace fillna() operations. You can still chain, but now must assign that chain to the df instead of modifying in place, e.g. like so:
df = df.fillna({'Name':'.', 'City':'.'}).fillna(0)
You could use apply for your columns with checking dtype whether it's numeric or not by checking dtype.kind:
res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))
print(res)
A B City Name
0 1.0 0.25 Seattle Jack
1 2.1 0.00 SF Sue
2 0.0 0.00 LA .
3 4.7 4.00 OC Bob
4 5.6 12.20 . Alice
5 6.8 14.40 . John
You can select your desired columns and do it by assignment:
df[['a', 'b']] = df[['a','b']].fillna(value=0)
The resulting output is as expected:
a b c
0 1.0 4.0 NaN
1 2.0 5.0 NaN
2 3.0 0.0 7.0
3 0.0 6.0 8.0
You can using dict , fillna with different value for different column
df.fillna({'a':0,'b':0})
Out[829]:
a b c
0 1.0 4.0 NaN
1 2.0 5.0 NaN
2 3.0 0.0 7.0
3 0.0 6.0 8.0
After assign it back
df=df.fillna({'a':0,'b':0})
df
Out[831]:
a b c
0 1.0 4.0 NaN
1 2.0 5.0 NaN
2 3.0 0.0 7.0
3 0.0 6.0 8.0
Pandas: is there a way to do fillna() on multiple columns at once?
It should be as simple as df = df.fillna(value=0), with whatever value you want instead of 0. I'm using 17.1.
Pandas: Is it possible to use the fillna() method using a calculation between two columns of a specific row?
python - Using pandas fillna() on multiple columns - Stack Overflow
Fillna with inplace=True not working with multiple columns but fine with single column
Videos
Heya, I was wondering if there's a way to fillna on multiple columns at once in a Pandas' DataFrame. Currently I just do them one by one, row after row. Seems like there should be an easier way. If it helps, the fillna value I want to use is the same for all columns.
Looking forward to hearing your tricks!
UPDATE [3/5]: to be clear, I want to fillna multiple columns, which are just a subset of the original df (that is, there are some columns I do not want/need to fillna).
I am currently cleaning data using Pandas of bike sales.
Each bike sale is broken down into: 'Quantity_Sold' 'Total_Cost', 'Total_Revenue', 'Total_Profit' , 'Unit_Cost', 'Unit_Price', 'Unit_Profit'.
There are Null values for some of these columns, however, it is possible to calculate the missing column's valuing using the other remaining columns that are filled. For example a null "Total_Cost" column can be calculated via "Unit_Cost"*"Quantity" etc.
How do I use the fillna() method to do this, so I can fill in the columns without resorting to mean, median and averages?
fillna is generally for carrying an observation forward or backward. Instead, I'd use np.where... If I understand what you're asking.
import numpy as np
np.where(np.isnan(df['newcolumn1']), df['oldcolumn1'], df['newcolumn1'])
To answer your question: yes. Look at using the value argument of fillna. Along with the to_dict() method on the other dataframe.
But to really solve your problem, have a look at the update() method of the DataFrame. Assuming your two dataframes are similarly indexed, I think it's exactly what you want.
In [36]: df = pd.DataFrame({'A': [0, np.nan, 2, 3, np.nan, 5], 'B': [1, 0, 1, np.nan, np.nan, 1]})
In [37]: df
Out[37]:
A B
0 0 1
1 NaN 0
2 2 1
3 3 NaN
4 NaN NaN
5 5 1
In [38]: df2 = pd.DataFrame({'A': [0, np.nan, 2, 3, 4, 5], 'B': [1, 0, 1, 1, 0, 0]})
In [40]: df2
Out[40]:
A B
0 0 1
1 NaN 0
2 2 1
3 3 1
4 4 0
5 5 0
In [52]: df.update(df2, overwrite=False)
In [53]: df
Out[53]:
A B
0 0 1
1 NaN 0
2 2 1
3 3 1
4 4 0
5 5 1
Notice that all the NaNs in df were replaced except for (1, A) since that was also NaN in df2. Also some of the values like (5, B) differed between df and df2. By using overwrite=False it keeps the value from df.
EDIT: Based on comments it seems like your looking for a solution where the column names don't match over the two DataFrames (It'd be helpful if you posted sample data). Let's try that, replacing column A with C and B with D.
In [33]: df = pd.DataFrame({'A': [0, np.nan, 2, 3, np.nan, 5], 'B': [1, 0, 1, np.nan, np.nan, 1]})
In [34]: df2 = pd.DataFrame({'C': [0, np.nan, 2, 3, 4, 5], 'D': [1, 0, 1, 1, 0, 0]})
In [35]: df
Out[35]:
A B
0 0 1
1 NaN 0
2 2 1
3 3 NaN
4 NaN NaN
5 5 1
In [36]: df2
Out[36]:
C D
0 0 1
1 NaN 0
2 2 1
3 3 1
4 4 0
5 5 0
In [37]: d = {'A': df2.C, 'B': df2.D} # pass this values in fillna
In [38]: df
Out[38]:
A B
0 0 1
1 NaN 0
2 2 1
3 3 NaN
4 NaN NaN
5 5 1
In [40]: df.fillna(value=d)
Out[40]:
A B
0 0 1
1 NaN 0
2 2 1
3 3 1
4 4 0
5 5 1
I think if you invest the time to learn pandas you'll hit fewer moments of frustration. It's a massive library though, so it takes time.
These answers are guided by the fact that OP wanted an in place edit of an existing dataframe. Usually, I overwrite the existing dataframe with a new one.
Use pandas.DataFrame.fillna with a dict
Pandas fillna allows us to pass a dictionary that specifies which columns will be filled in and with what.
So this will work
a.fillna({'a': 0, 'b': 0})
a b c
0 1.0 5.0 5
1 2.0 0.0 1
2 0.0 6.0 5
3 0.0 0.0 2
With an in place edit made possible with:
a.fillna({'a': 0, 'b': 0}, inplace=True)
NOTE: I would've just done this a = a.fillna({'a': 0, 'b': 0})
We don't save text length but we could get cute using dict.fromkeys
a.fillna(dict.fromkeys(['a', 'b'], 0), inplace=True)
loc
We can use the same format as the OP but place it in the correct columns using loc
a.loc[:, ['a', 'b']] = a[['a', 'b']].fillna(0)
a
a b c
0 1.0 5.0 5
1 2.0 0.0 1
2 0.0 6.0 5
3 0.0 0.0 2
pandas.DataFrame.update
Explicitly made to make in place edits with the non-null values of another dataframe
a.update(a[['a', 'b']].fillna(0))
a
a b c
0 1.0 5.0 5
1 2.0 0.0 1
2 0.0 6.0 5
3 0.0 0.0 2
Iterate column by column
I really don't like this approach because it is unnecessarily verbose
for col in ['a', 'b']:
a[col].fillna(0, inplace=True)
a
a b c
0 1.0 5.0 5
1 2.0 0.0 1
2 0.0 6.0 5
3 0.0 0.0 2
fillna with a dataframe
Use the result of a[['a', 'b']].fillna(0) as the input for another fillna. In my opinion, this is silly. Just use the first option.
a.fillna(a[['a', 'b']].fillna(0), inplace=True)
a
a b c
0 1.0 5.0 5
1 2.0 0.0 1
2 0.0 6.0 5
3 0.0 0.0 2
EDIT: As @piRSquared pointed out, the first solution should be
a.loc[:, ['a', 'b']] = a[['a', 'b']].fillna(0)
to fillna in selected columns
or
a.fillna(0, inplace = True)
to fillna in all the columns