DataFrame.fillna() or Series.fillna() will do this for you.
Example:
In [7]: df
Out[7]:
0 1
0 NaN NaN
1 -0.494375 0.570994
2 NaN NaN
3 1.876360 -0.229738
4 NaN NaN
In [8]: df.fillna(0)
Out[8]:
0 1
0 0.000000 0.000000
1 -0.494375 0.570994
2 0.000000 0.000000
3 1.876360 -0.229738
4 0.000000 0.000000
To fill the NaNs in only one column, select just that column.
In [12]: df[1] = df[1].fillna(0)
In [13]: df
Out[13]:
0 1
0 NaN 0.000000
1 -0.494375 0.570994
2 NaN 0.000000
3 1.876360 -0.229738
4 NaN 0.000000
Or you can use the built in column-specific functionality:
df = df.fillna({1: 0})
Answer from Aman on Stack OverflowDataFrame.fillna() or Series.fillna() will do this for you.
Example:
In [7]: df
Out[7]:
0 1
0 NaN NaN
1 -0.494375 0.570994
2 NaN NaN
3 1.876360 -0.229738
4 NaN NaN
In [8]: df.fillna(0)
Out[8]:
0 1
0 0.000000 0.000000
1 -0.494375 0.570994
2 0.000000 0.000000
3 1.876360 -0.229738
4 0.000000 0.000000
To fill the NaNs in only one column, select just that column.
In [12]: df[1] = df[1].fillna(0)
In [13]: df
Out[13]:
0 1
0 NaN 0.000000
1 -0.494375 0.570994
2 NaN 0.000000
3 1.876360 -0.229738
4 NaN 0.000000
Or you can use the built in column-specific functionality:
df = df.fillna({1: 0})
It is not guaranteed that the slicing returns a view or a copy. You can do
df['column'] = df['column'].fillna(value)
python - Replace NaN in one column with value from corresponding row of second column - Stack Overflow
Help! Filling NaN with a certain values based on conditions from other column?
python - What is the best solution to replace NaN values? - Data Science Stack Exchange
I need to replace NaN in one column with value for other col
Videos
I've been working on learning Python and for something to code, I picked some VBA that I had.
In VBA:
If Cells(I, "C").Value <> "" And Cells(I, "B").Value = "" Then
Cells(I, "B").Value = Cells(I, "C").Value
End IfIt simply checks if colC is not Null and colB is Null, then replaces colB with the value from colC.
I can read in the csv file, I was able to select and delete some rows I didn't want, but I can't seem to get the syntax right for this...
Assuming your DataFrame is in df:
df.Temp_Rating.fillna(df.Farheit, inplace=True)
del df['Farheit']
df.columns = 'File heat Observations'.split()
First replace any NaN values with the corresponding value of df.Farheit. Delete the 'Farheit' column. Then rename the columns. Here's the resulting DataFrame:
File heat Observations
0 1 YesQ 75
1 1 NoR 115
2 1 YesA 63
3 1 NoT 41
4 1 NoY 80
5 1 YesZ 12
6 2 YesQ 111
7 2 NoR 60
8 2 YesA 19
9 2 NoT 77
10 2 NoY 21
11 2 YesZ 54
12 3 YesQ 84
13 3 NoR 67
14 3 YesA 94
15 3 NoT 39
16 3 NoY 46
17 3 YesZ 81
The above mentioned solutions did not work for me. The method I used was:
df.loc[df['foo'].isnull(),'foo'] = df['bar']
There is no one size fits all. So you cannot assume that one technique will work the best for all the datasets.
That being said the goal of imputing missing values is to ensure that after imputation, the distribution of the column does not change. So if you have a feature that follows a left skewed distribution, then after imputation the distribution should not change much.
Following this logic use multiple imputation techniques to see which one retains the original distribution of the feature you are imputing the values for.
Mean is suitable when you have a Gaussian distribution of continuous data. Mode is suitable when your column has categorical data and one category is clearly more like to occur than others. Median is better when your data has outliers which can skew the mean. You can opt to remove rows with missing values if the numbers of rows is very small compared to the total number of rows. There are other techniques which can be useful depending on the situation like training a model to fill missing values, MICE (for missing at Random type data), KNNImputer and LOCF.
Alternatively, if you have a significant number of missing values, you can see how the results are different when you impute missing values and when you ignore rows with missing values.