Updated answer, April 2025:
pd.to_numeric can convert arguments to a numeric type. The option errors='coerce' sets things to NaN. However, it can only work on 1D objects (i.e. scalar, list, tuple, 1-d array, or Series). Therefore, to use it on a DataFrame, we need to use df.apply to convert each column individually. Note that any **kwargs given to apply will be passed onto the function, so we can still set errors='coerce'.
Using pd.to_numeric along with df.apply will set any strings to NaN. If we want to convert those to 0 values, we can then use .fillna(0) on the resulting DataFrame.
For example (and note this also works with the strings suggested by the original question "$-" and "($24)"):
import pandas as pd
df = pd.DataFrame({
'a': (1, 'sd', 1),
'b': (2., 2., 'fg'),
'c': (4, "$-", "($24)")
})
print(df)
# a b c
# 0 1 2.0 4
# 1 sd 2.0 $-
# 2 1 fg ($24)
df = df.apply(pd.to_numeric, errors='coerce').fillna(0)
print(df)
# a b c
# 0 1.0 2.0 4.0
# 1 0.0 2.0 0.0
# 2 1.0 0.0 0.0
My original answer from 2015, which is now deprecated
You can use the convert_objects method of the DataFrame, with convert_numeric=True to change the strings to NaNs
From the docs:
convert_numeric: If True, attempt to coerce to numbers (including strings), with unconvertible values becoming NaN.
In [17]: df
Out[17]:
a b c
0 1. 2. 4
1 sd 2. 4
2 1. fg 5
In [18]: df2 = df.convert_objects(convert_numeric=True)
In [19]: df2
Out[19]:
a b c
0 1 2 4
1 NaN 2 4
2 1 NaN 5
Finally, if you want to convert those NaNs to 0's, you can use df.replace
In [20]: df2.replace('NaN',0)
Out[20]:
a b c
0 1 2 4
1 0 2 4
2 1 0 5
Answer from tmdavison on Stack OverflowUpdated answer, April 2025:
pd.to_numeric can convert arguments to a numeric type. The option errors='coerce' sets things to NaN. However, it can only work on 1D objects (i.e. scalar, list, tuple, 1-d array, or Series). Therefore, to use it on a DataFrame, we need to use df.apply to convert each column individually. Note that any **kwargs given to apply will be passed onto the function, so we can still set errors='coerce'.
Using pd.to_numeric along with df.apply will set any strings to NaN. If we want to convert those to 0 values, we can then use .fillna(0) on the resulting DataFrame.
For example (and note this also works with the strings suggested by the original question "$-" and "($24)"):
import pandas as pd
df = pd.DataFrame({
'a': (1, 'sd', 1),
'b': (2., 2., 'fg'),
'c': (4, "$-", "($24)")
})
print(df)
# a b c
# 0 1 2.0 4
# 1 sd 2.0 $-
# 2 1 fg ($24)
df = df.apply(pd.to_numeric, errors='coerce').fillna(0)
print(df)
# a b c
# 0 1.0 2.0 4.0
# 1 0.0 2.0 0.0
# 2 1.0 0.0 0.0
My original answer from 2015, which is now deprecated
You can use the convert_objects method of the DataFrame, with convert_numeric=True to change the strings to NaNs
From the docs:
convert_numeric: If True, attempt to coerce to numbers (including strings), with unconvertible values becoming NaN.
In [17]: df
Out[17]:
a b c
0 1. 2. 4
1 sd 2. 4
2 1. fg 5
In [18]: df2 = df.convert_objects(convert_numeric=True)
In [19]: df2
Out[19]:
a b c
0 1 2 4
1 NaN 2 4
2 1 NaN 5
Finally, if you want to convert those NaNs to 0's, you can use df.replace
In [20]: df2.replace('NaN',0)
Out[20]:
a b c
0 1 2 4
1 0 2 4
2 1 0 5
Use .to_numeric to covert the strings to numeric (set strings to NaN using the errors option 'coerce'):
df = pd.to_numeric(df, errors='coerce')
and then convert the NaN value to zeros using replace:
df.replace('NaN',0)
python - How to replace 0 with None in a pandas column? - Stack Overflow
python - Pandas: Replacing Non-numeric cells with 0 - Stack Overflow
[Pandas] Replacing Zero Values in a Column
Python Pandas replace multiple columns zero to Nan - Stack Overflow
Videos
You can use the to_numeric method, but it's not changing the value in place. You need to set the column to the new values:
training_data['usagequantity'] = (
pd.to_numeric(training_data['usagequantity'],
errors='coerce')
.fillna(0)
)
to_numeric sets the non-numeric values to NaNs, and then the chained fillna method replaces the NaNs with zeros.
Following code can work:
df.col =pd.to_numeric(df.col, errors ='coerce').fillna(0).astype('int')
Hi all,
I decided to take my first try at a kaggle competition, however, I've been struggling something for awhile now. Perhaps you can help.
Basically, I've got a dataframe where the latitude and longitude (floats) are both zero for a very very small number of lines.
The std deviation for these columns is tiny, so I was just going to replace the zero values with the mean values. How should I go about this? Nothing I have tried so far has worked.
Thanks.
I think you need replace by dict:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace({'0':np.nan, 0:np.nan})
You could use the 'replace' method and pass the values that you want to replace in a list as the first parameter along with the desired one as the second parameter:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace(['0', 0], np.nan)
Use mask:
df['X'] = df['X'].mask(df.Y == 1, 0)
df[['X', 'Y']] = df[['X', 'Y']].mask(df.Z == 1, 0)
Another solution with DataFrame.loc:
df.loc[df.Y == 1, 'X'] = 0
df.loc[df.Z == 1, ['X', 'Y']] = 0
print (df)
X Y Z
0 0 0 1
1 0 1 0
2 0 0 1
You can generalize this to wanting the last index of 1 per row to remain 1, and leave everything else as 0. For performance operate on the underlying numpy array:
a = df.values
idx = (a.shape[1] - a[:, ::-1].argmax(1)) - 1
t = np.zeros(a.shape)
t[np.arange(a.shape[0]), idx] = 1
array([[0., 0., 1.],
[0., 1., 0.],
[0., 0., 1.]])
If you need the result back as a DataFrame:
pd.DataFrame(t, columns=df.columns, index=df.index).astype(int)
X Y Z
0 0 0 1
1 0 1 0
2 0 0 1
