In [16]: df = DataFrame(np.arange(10).reshape(5,2),columns=list('AB'))
In [17]: df
Out[17]:
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
In [18]: df.dtypes
Out[18]:
A int64
B int64
dtype: object
Convert a series
In [19]: df['A'].apply(str)
Out[19]:
0 0
1 2
2 4
3 6
4 8
Name: A, dtype: object
In [20]: df['A'].apply(str)[0]
Out[20]: '0'
Don't forget to assign the result back:
df['A'] = df['A'].apply(str)
Convert the whole frame
In [21]: df.applymap(str)
Out[21]:
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
In [22]: df.applymap(str).iloc[0,0]
Out[22]: '0'
df = df.applymap(str)
Answer from Jeff on Stack OverflowIn [16]: df = DataFrame(np.arange(10).reshape(5,2),columns=list('AB'))
In [17]: df
Out[17]:
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
In [18]: df.dtypes
Out[18]:
A int64
B int64
dtype: object
Convert a series
In [19]: df['A'].apply(str)
Out[19]:
0 0
1 2
2 4
3 6
4 8
Name: A, dtype: object
In [20]: df['A'].apply(str)[0]
Out[20]: '0'
Don't forget to assign the result back:
df['A'] = df['A'].apply(str)
Convert the whole frame
In [21]: df.applymap(str)
Out[21]:
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
In [22]: df.applymap(str).iloc[0,0]
Out[22]: '0'
df = df.applymap(str)
Change data type of DataFrame column:
To int:
df.column_name = df.column_name.astype(np.int64)
To str:
df.column_name = df.column_name.astype(str)
Videos
You need add parameter errors='coerce' to function to_numeric:
CopyID = pd.to_numeric(ID, errors='coerce')
If ID is column:
Copydf.ID = pd.to_numeric(df.ID, errors='coerce')
but non numeric are converted to NaN, so all values are float.
For int need convert NaN to some value e.g. 0 and then cast to int:
Copydf.ID = pd.to_numeric(df.ID, errors='coerce').fillna(0).astype(np.int64)
Sample:
Copydf = pd.DataFrame({'ID':['4806105017087','4806105017087','CN414149']})
print (df)
ID
0 4806105017087
1 4806105017087
2 CN414149
print (pd.to_numeric(df.ID, errors='coerce'))
0 4.806105e+12
1 4.806105e+12
2 NaN
Name: ID, dtype: float64
df.ID = pd.to_numeric(df.ID, errors='coerce').fillna(0).astype(np.int64)
print (df)
ID
0 4806105017087
1 4806105017087
2 0
EDIT: If use pandas 0.25+ then is possible use integer_na:
Copydf.ID = pd.to_numeric(df.ID, errors='coerce').astype('Int64')
print (df)
ID
0 4806105017087
1 4806105017087
2 NaN
- If you're here because you got
OverflowError: Python int too large to convert to C long
use .astype('int64') for 64-bit signed integers:
Copydf['ID'] = df['ID'].astype('int64')
If you don't want to lose the values with letters in them, use str.replace() with a regex pattern to remove the non-digit characters.
Copydf['ID'] = df['ID'].str.replace('[^0-9]', '', regex=True).astype('int64')
Then input
0 4806105017087
1 4806105017087
2 CN414149
Name: ID, dtype: object
converts into
0 4806105017087
1 4806105017087
2 414149
Name: ID, dtype: int64
I have a pandas column of the data type string, with entries such as: array(['7 Average', '6 Low Average', '8 Good', '11 Excellent', '9 Better', '5 Fair', '10 Very Good', '12 Luxury', '4 Low', '3 Poor', '13 Mansion'], dtype=object) I am trying to change them to the data type int to make them useful for my linear regression project. I keep messing it up and cant get it right. Can anyone provide some assistance?
This is intended behaviour. This is how pandas stores strings.
From the docs
Pandas uses the object dtype for storing strings.
For a simple test, you can make a dummy dataframe and check it's dtype too.
import pandas as pd
df = pd.DataFrame(["abc", "ab"])
df[0].dtype
#Output:
dtype('O')
You can do that by using apply() function in this way:
data['id'] = data['id'].apply(lambda x: str(x))
This will convert all the values of id column to string.
You can ensure the type of the values like this:
type(data['id'][0]) (It is checking the first value of 'id' column)
This will give the output str.
And data['id'].dtype will give dtype('O') that is object.
You can also use data.info() to check all the information about that DataFrame.
You're trying to compare a scalar with the entire series which raise the ValueError you saw. A simple method would be to cast the boolean series to int:
CopyIn [84]:
df['viz'] = (df['viz'] !='n').astype(int)
df
Out[84]:
viz a1_count a1_mean a1_std
0 0 3 2 0.816497
1 1 0 NaN NaN
2 0 2 51 50.000000
You can also use np.where:
CopyIn [86]:
df['viz'] = np.where(df['viz'] == 'n', 0, 1)
df
Out[86]:
viz a1_count a1_mean a1_std
0 0 3 2 0.816497
1 1 0 NaN NaN
2 0 2 51 50.000000
Output from the boolean comparison:
CopyIn [89]:
df['viz'] !='n'
Out[89]:
0 False
1 True
2 False
Name: viz, dtype: bool
And then casting to int:
CopyIn [90]:
(df['viz'] !='n').astype(int)
Out[90]:
0 0
1 1
2 0
Name: viz, dtype: int32
From @TMWP's comment above:
pd.to_numeric(myDF['myDFCell'], errors='coerce')
It works like a charm and is a quick and simple one liner