I think you need replace by dict:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace({'0':np.nan, 0:np.nan})
Answer from jezrael on Stack OverflowI think you need replace by dict:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace({'0':np.nan, 0:np.nan})
You could use the 'replace' method and pass the values that you want to replace in a list as the first parameter along with the desired one as the second parameter:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace(['0', 0], np.nan)
python - How do I fill NA values in multiple columns in pandas? - Stack Overflow
Can't replace 0 to nan in Python using Pandas - Stack Overflow
python - How to replace NaN values in a dataframe column - Stack Overflow
I need to replace NaN in one column with value for other col
Videos
you can use update():
In [145]: df
Out[145]:
a b c d e
0 NaN NaN NaN 3 8
1 NaN NaN NaN 8 7
2 NaN NaN NaN 2 8
3 NaN NaN NaN 7 4
4 NaN NaN NaN 4 9
5 NaN NaN NaN 1 9
6 NaN NaN NaN 7 7
7 NaN NaN NaN 6 5
8 NaN NaN NaN 0 0
9 NaN NaN NaN 9 5
In [146]: df.update(df[['a','b','c']].fillna(0))
In [147]: df
Out[147]:
a b c d e
0 0.0 0.0 0.0 3 8
1 0.0 0.0 0.0 8 7
2 0.0 0.0 0.0 2 8
3 0.0 0.0 0.0 7 4
4 0.0 0.0 0.0 4 9
5 0.0 0.0 0.0 1 9
6 0.0 0.0 0.0 7 7
7 0.0 0.0 0.0 6 5
8 0.0 0.0 0.0 0 0
9 0.0 0.0 0.0 9 5
In [15]: cols= ['one', 'two']
In [16]: df
Out[16]:
one two three four five
a -0.343241 0.453029 -0.895119 bar False
b NaN NaN NaN NaN NaN
c 0.839174 0.229781 -1.244124 bar True
d NaN NaN NaN NaN NaN
e 1.300641 -1.797828 0.495313 bar True
f -0.182505 -1.527464 0.712738 bar False
g NaN NaN NaN NaN NaN
h 0.626568 -0.971003 1.192831 bar True
In [17]: df[cols]=df[cols].fillna(0)
In [18]: df
Out[18]:
one two three four five
a -0.343241 0.453029 -0.895119 bar False
b 0.000000 0.000000 NaN NaN NaN
c 0.839174 0.229781 -1.244124 bar True
d 0.000000 0.000000 NaN NaN NaN
e 1.300641 -1.797828 0.495313 bar True
f -0.182505 -1.527464 0.712738 bar False
g 0.000000 0.000000 NaN NaN NaN
h 0.626568 -0.971003 1.192831 bar True
DataFrame.fillna() or Series.fillna() will do this for you.
Example:
In [7]: df
Out[7]:
0 1
0 NaN NaN
1 -0.494375 0.570994
2 NaN NaN
3 1.876360 -0.229738
4 NaN NaN
In [8]: df.fillna(0)
Out[8]:
0 1
0 0.000000 0.000000
1 -0.494375 0.570994
2 0.000000 0.000000
3 1.876360 -0.229738
4 0.000000 0.000000
To fill the NaNs in only one column, select just that column.
In [12]: df[1] = df[1].fillna(0)
In [13]: df
Out[13]:
0 1
0 NaN 0.000000
1 -0.494375 0.570994
2 NaN 0.000000
3 1.876360 -0.229738
4 NaN 0.000000
Or you can use the built in column-specific functionality:
df = df.fillna({1: 0})
It is not guaranteed that the slicing returns a view or a copy. You can do
df['column'] = df['column'].fillna(value)
I've been working on learning Python and for something to code, I picked some VBA that I had.
In VBA:
If Cells(I, "C").Value <> "" And Cells(I, "B").Value = "" Then
Cells(I, "B").Value = Cells(I, "C").Value
End IfIt simply checks if colC is not Null and colB is Null, then replaces colB with the value from colC.
I can read in the csv file, I was able to select and delete some rows I didn't want, but I can't seem to get the syntax right for this...
You can use the DataFrame.fillna function to fill the NaN values in your data. For example, assuming your data is in a DataFrame called df,
df.fillna(0, inplace=True)
will replace the missing values with the constant value 0. You can also do more clever things, such as replacing the missing values with the mean of that column:
df.fillna(df.mean(), inplace=True)
or take the last value seen for a column:
df.fillna(method='ffill', inplace=True)
Filling the NaN values is called imputation. Try a range of different imputation methods and see which ones work best for your data.
# Taking care of missing data
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])
suppose the name of my array is $X$ and I want to take care of missing data in columns indexed $1$ and $2$ by replacing it with mean. Imputer is a great class to do this from sklearn library