You can convert most of the columns by just calling convert_objects:
CopyIn [36]:
df = df.convert_objects(convert_numeric=True)
df.dtypes
Out[36]:
Date object
WD int64
Manpower float64
2nd object
CTR object
2ndU float64
T1 int64
T2 int64
T3 int64
T4 float64
dtype: object
For column '2nd' and 'CTR' we can call the vectorised str methods to replace the thousands separator and remove the '%' sign and then astype to convert:
CopyIn [39]:
df['2nd'] = df['2nd'].str.replace(',','').astype(int)
df['CTR'] = df['CTR'].str.replace('%','').astype(np.float64)
df.dtypes
Out[39]:
Date object
WD int64
Manpower float64
2nd int32
CTR float64
2ndU float64
T1 int64
T2 int64
T3 int64
T4 object
dtype: object
In [40]:
df.head()
Out[40]:
Date WD Manpower 2nd CTR 2ndU T1 T2 T3 T4
0 2013/4/6 6 NaN 2645 5.27 0.29 407 533 454 368
1 2013/4/7 7 NaN 2118 5.89 0.31 257 659 583 369
2 2013/4/13 6 NaN 2470 5.38 0.29 354 531 473 383
3 2013/4/14 7 NaN 2033 6.77 0.37 396 748 681 458
4 2013/4/20 6 NaN 2690 5.38 0.29 361 528 541 381
Or you can do the string handling operations above without the call to astype and then call convert_objects to convert everything in one go.
UPDATE
Since version 0.17.0 convert_objects is deprecated and there isn't a top-level function to do this so you need to do:
df.apply(lambda col:pd.to_numeric(col, errors='coerce'))
See the docs and this related question: pandas: to_numeric for multiple columns
Answer from EdChum on Stack OverflowYou can convert most of the columns by just calling convert_objects:
CopyIn [36]:
df = df.convert_objects(convert_numeric=True)
df.dtypes
Out[36]:
Date object
WD int64
Manpower float64
2nd object
CTR object
2ndU float64
T1 int64
T2 int64
T3 int64
T4 float64
dtype: object
For column '2nd' and 'CTR' we can call the vectorised str methods to replace the thousands separator and remove the '%' sign and then astype to convert:
CopyIn [39]:
df['2nd'] = df['2nd'].str.replace(',','').astype(int)
df['CTR'] = df['CTR'].str.replace('%','').astype(np.float64)
df.dtypes
Out[39]:
Date object
WD int64
Manpower float64
2nd int32
CTR float64
2ndU float64
T1 int64
T2 int64
T3 int64
T4 object
dtype: object
In [40]:
df.head()
Out[40]:
Date WD Manpower 2nd CTR 2ndU T1 T2 T3 T4
0 2013/4/6 6 NaN 2645 5.27 0.29 407 533 454 368
1 2013/4/7 7 NaN 2118 5.89 0.31 257 659 583 369
2 2013/4/13 6 NaN 2470 5.38 0.29 354 531 473 383
3 2013/4/14 7 NaN 2033 6.77 0.37 396 748 681 458
4 2013/4/20 6 NaN 2690 5.38 0.29 361 528 541 381
Or you can do the string handling operations above without the call to astype and then call convert_objects to convert everything in one go.
UPDATE
Since version 0.17.0 convert_objects is deprecated and there isn't a top-level function to do this so you need to do:
df.apply(lambda col:pd.to_numeric(col, errors='coerce'))
See the docs and this related question: pandas: to_numeric for multiple columns
convert_objects is deprecated.
For pandas >= 0.17.0, use pd.to_numeric
Copydf["2nd"] = pd.to_numeric(df["2nd"])
Videos
Solution for pandas 0.24+ for converting numeric with missing values:
df = pd.DataFrame({'column name':[7500000.0,7500000.0, np.nan]})
print (df['column name'])
0 7500000.0
1 7500000.0
2 NaN
Name: column name, dtype: float64
df['column name'] = df['column name'].astype(np.int64)
ValueError: Cannot convert non-finite values (NA or inf) to integer
#http://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
df['column name'] = df['column name'].astype('Int64')
print (df['column name'])
0 7500000
1 7500000
2 NaN
Name: column name, dtype: Int64
I think you need cast to numpy.int64:
df['column name'].astype(np.int64)
Sample:
df = pd.DataFrame({'column name':[7500000.0,7500000.0]})
print (df['column name'])
0 7500000.0
1 7500000.0
Name: column name, dtype: float64
df['column name'] = df['column name'].astype(np.int64)
#same as
#df['column name'] = df['column name'].astype(pd.np.int64)
print (df['column name'])
0 7500000
1 7500000
Name: column name, dtype: int64
If some NaNs in columns need replace them to some int (e.g. 0) by fillna, because type of NaN is float:
df = pd.DataFrame({'column name':[7500000.0,np.nan]})
df['column name'] = df['column name'].fillna(0).astype(np.int64)
print (df['column name'])
0 7500000
1 0
Name: column name, dtype: int64
Also check documentation - missing data casting rules
EDIT:
Convert values with NaNs is buggy:
df = pd.DataFrame({'column name':[7500000.0,np.nan]})
df['column name'] = df['column name'].values.astype(np.int64)
print (df['column name'])
0 7500000
1 -9223372036854775808
Name: column name, dtype: int64
You can need to pass in the string 'int64':
>>> import pandas as pd
>>> df = pd.DataFrame({'a': [1.0, 2.0]}) # some test dataframe
>>> df['a'].astype('int64')
0 1
1 2
Name: a, dtype: int64
There are some alternative ways to specify 64-bit integers:
>>> df['a'].astype('i8') # integer with 8 bytes (64 bit)
0 1
1 2
Name: a, dtype: int64
>>> import numpy as np
>>> df['a'].astype(np.int64) # native numpy 64 bit integer
0 1
1 2
Name: a, dtype: int64
Or use np.int64 directly on your column (but it returns a numpy.array):
>>> np.int64(df['a'])
array([1, 2], dtype=int64)