You need add parameter errors='coerce' to function to_numeric:
CopyID = pd.to_numeric(ID, errors='coerce')
If ID is column:
Copydf.ID = pd.to_numeric(df.ID, errors='coerce')
but non numeric are converted to NaN, so all values are float.
For int need convert NaN to some value e.g. 0 and then cast to int:
Copydf.ID = pd.to_numeric(df.ID, errors='coerce').fillna(0).astype(np.int64)
Sample:
Copydf = pd.DataFrame({'ID':['4806105017087','4806105017087','CN414149']})
print (df)
ID
0 4806105017087
1 4806105017087
2 CN414149
print (pd.to_numeric(df.ID, errors='coerce'))
0 4.806105e+12
1 4.806105e+12
2 NaN
Name: ID, dtype: float64
df.ID = pd.to_numeric(df.ID, errors='coerce').fillna(0).astype(np.int64)
print (df)
ID
0 4806105017087
1 4806105017087
2 0
EDIT: If use pandas 0.25+ then is possible use integer_na:
Copydf.ID = pd.to_numeric(df.ID, errors='coerce').astype('Int64')
print (df)
ID
0 4806105017087
1 4806105017087
2 NaN
Answer from jezrael on Stack OverflowYou need add parameter errors='coerce' to function to_numeric:
CopyID = pd.to_numeric(ID, errors='coerce')
If ID is column:
Copydf.ID = pd.to_numeric(df.ID, errors='coerce')
but non numeric are converted to NaN, so all values are float.
For int need convert NaN to some value e.g. 0 and then cast to int:
Copydf.ID = pd.to_numeric(df.ID, errors='coerce').fillna(0).astype(np.int64)
Sample:
Copydf = pd.DataFrame({'ID':['4806105017087','4806105017087','CN414149']})
print (df)
ID
0 4806105017087
1 4806105017087
2 CN414149
print (pd.to_numeric(df.ID, errors='coerce'))
0 4.806105e+12
1 4.806105e+12
2 NaN
Name: ID, dtype: float64
df.ID = pd.to_numeric(df.ID, errors='coerce').fillna(0).astype(np.int64)
print (df)
ID
0 4806105017087
1 4806105017087
2 0
EDIT: If use pandas 0.25+ then is possible use integer_na:
Copydf.ID = pd.to_numeric(df.ID, errors='coerce').astype('Int64')
print (df)
ID
0 4806105017087
1 4806105017087
2 NaN
- If you're here because you got
OverflowError: Python int too large to convert to C long
use .astype('int64') for 64-bit signed integers:
Copydf['ID'] = df['ID'].astype('int64')
If you don't want to lose the values with letters in them, use str.replace() with a regex pattern to remove the non-digit characters.
Copydf['ID'] = df['ID'].str.replace('[^0-9]', '', regex=True).astype('int64')
Then input
0 4806105017087
1 4806105017087
2 CN414149
Name: ID, dtype: object
converts into
0 4806105017087
1 4806105017087
2 414149
Name: ID, dtype: int64
Videos
I have a pandas column of the data type string, with entries such as: array(['7 Average', '6 Low Average', '8 Good', '11 Excellent', '9 Better', '5 Fair', '10 Very Good', '12 Luxury', '4 Low', '3 Poor', '13 Mansion'], dtype=object) I am trying to change them to the data type int to make them useful for my linear regression project. I keep messing it up and cant get it right. Can anyone provide some assistance?
Working on a data set where salary is written as :' $53K-$91K (Glassdoor est.) ', I'm trying to convert it to an int. but I'm only able to convert the first number by typing:
df.Salary.str.extract('(\d+)', expand= False).astype(int)
Any tips of how I can get 53-91 out of that?
Thanks!
You're trying to compare a scalar with the entire series which raise the ValueError you saw. A simple method would be to cast the boolean series to int:
CopyIn [84]:
df['viz'] = (df['viz'] !='n').astype(int)
df
Out[84]:
viz a1_count a1_mean a1_std
0 0 3 2 0.816497
1 1 0 NaN NaN
2 0 2 51 50.000000
You can also use np.where:
CopyIn [86]:
df['viz'] = np.where(df['viz'] == 'n', 0, 1)
df
Out[86]:
viz a1_count a1_mean a1_std
0 0 3 2 0.816497
1 1 0 NaN NaN
2 0 2 51 50.000000
Output from the boolean comparison:
CopyIn [89]:
df['viz'] !='n'
Out[89]:
0 False
1 True
2 False
Name: viz, dtype: bool
And then casting to int:
CopyIn [90]:
(df['viz'] !='n').astype(int)
Out[90]:
0 0
1 1
2 0
Name: viz, dtype: int32
From @TMWP's comment above:
pd.to_numeric(myDF['myDFCell'], errors='coerce')
It works like a charm and is a quick and simple one liner