Use pd.to_numeric with errors='coerce'
# Setup
s = pd.Series(['1', '2', '3', '4', '.'])
s
0 1
1 2
2 3
3 4
4 .
dtype: object
pd.to_numeric(s, errors='coerce')
0 1.0
1 2.0
2 3.0
3 4.0
4 NaN
dtype: float64
If you need the NaNs filled in, use Series.fillna.
pd.to_numeric(s, errors='coerce').fillna(0, downcast='infer')
0 1
1 2
2 3
3 4
4 0
dtype: float64
Note, downcast='infer' will attempt to downcast floats to integers where possible. Remove the argument if you don't want that.
From v0.24+, pandas introduces a Nullable Integer type, which allows integers to coexist with NaNs. If you have integers in your column, you can use
pd.__version__ # '0.24.1' pd.to_numeric(s, errors='coerce').astype('Int32') 0 1 1 2 2 3 3 4 4 NaN dtype: Int32There are other options to choose from as well, read the docs for more.
Extension for DataFrames
If you need to extend this to DataFrames, you will need to apply it to each row. You can do this using DataFrame.apply.
# Setup.
np.random.seed(0)
df = pd.DataFrame({
'A' : np.random.choice(10, 5),
'C' : np.random.choice(10, 5),
'B' : ['1', '###', '...', 50, '234'],
'D' : ['23', '1', '...', '268', '$$']}
)[list('ABCD')]
df
A B C D
0 5 1 9 23
1 0 ### 3 1
2 3 ... 5 ...
3 3 50 2 268
4 7 234 4 $$
df.dtypes
A int64
B object
C int64
D object
dtype: object
df2 = df.apply(pd.to_numeric, errors='coerce')
df2
A B C D
0 5 1.0 9 23.0
1 0 NaN 3 1.0
2 3 NaN 5 NaN
3 3 50.0 2 268.0
4 7 234.0 4 NaN
df2.dtypes
A int64
B float64
C int64
D float64
dtype: object
You can also do this with DataFrame.transform; although my tests indicate this is marginally slower:
df.transform(pd.to_numeric, errors='coerce')
A B C D
0 5 1.0 9 23.0
1 0 NaN 3 1.0
2 3 NaN 5 NaN
3 3 50.0 2 268.0
4 7 234.0 4 NaN
If you have many columns (numeric; non-numeric), you can make this a little more performant by applying pd.to_numeric on the non-numeric columns only.
df.dtypes.eq(object)
A False
B True
C False
D True
dtype: bool
cols = df.columns[df.dtypes.eq(object)]
# Actually, `cols` can be any list of columns you need to convert.
cols
# Index(['B', 'D'], dtype='object')
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
# Alternatively,
# for c in cols:
# df[c] = pd.to_numeric(df[c], errors='coerce')
df
A B C D
0 5 1.0 9 23.0
1 0 NaN 3 1.0
2 3 NaN 5 NaN
3 3 50.0 2 268.0
4 7 234.0 4 NaN
Applying pd.to_numeric along the columns (i.e., axis=0, the default) should be slightly faster for long DataFrames.
Use pd.to_numeric with errors='coerce'
# Setup
s = pd.Series(['1', '2', '3', '4', '.'])
s
0 1
1 2
2 3
3 4
4 .
dtype: object
pd.to_numeric(s, errors='coerce')
0 1.0
1 2.0
2 3.0
3 4.0
4 NaN
dtype: float64
If you need the NaNs filled in, use Series.fillna.
pd.to_numeric(s, errors='coerce').fillna(0, downcast='infer')
0 1
1 2
2 3
3 4
4 0
dtype: float64
Note, downcast='infer' will attempt to downcast floats to integers where possible. Remove the argument if you don't want that.
From v0.24+, pandas introduces a Nullable Integer type, which allows integers to coexist with NaNs. If you have integers in your column, you can use
pd.__version__ # '0.24.1' pd.to_numeric(s, errors='coerce').astype('Int32') 0 1 1 2 2 3 3 4 4 NaN dtype: Int32There are other options to choose from as well, read the docs for more.
Extension for DataFrames
If you need to extend this to DataFrames, you will need to apply it to each row. You can do this using DataFrame.apply.
# Setup.
np.random.seed(0)
df = pd.DataFrame({
'A' : np.random.choice(10, 5),
'C' : np.random.choice(10, 5),
'B' : ['1', '###', '...', 50, '234'],
'D' : ['23', '1', '...', '268', '$$']}
)[list('ABCD')]
df
A B C D
0 5 1 9 23
1 0 ### 3 1
2 3 ... 5 ...
3 3 50 2 268
4 7 234 4 $$
df.dtypes
A int64
B object
C int64
D object
dtype: object
df2 = df.apply(pd.to_numeric, errors='coerce')
df2
A B C D
0 5 1.0 9 23.0
1 0 NaN 3 1.0
2 3 NaN 5 NaN
3 3 50.0 2 268.0
4 7 234.0 4 NaN
df2.dtypes
A int64
B float64
C int64
D float64
dtype: object
You can also do this with DataFrame.transform; although my tests indicate this is marginally slower:
df.transform(pd.to_numeric, errors='coerce')
A B C D
0 5 1.0 9 23.0
1 0 NaN 3 1.0
2 3 NaN 5 NaN
3 3 50.0 2 268.0
4 7 234.0 4 NaN
If you have many columns (numeric; non-numeric), you can make this a little more performant by applying pd.to_numeric on the non-numeric columns only.
df.dtypes.eq(object)
A False
B True
C False
D True
dtype: bool
cols = df.columns[df.dtypes.eq(object)]
# Actually, `cols` can be any list of columns you need to convert.
cols
# Index(['B', 'D'], dtype='object')
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
# Alternatively,
# for c in cols:
# df[c] = pd.to_numeric(df[c], errors='coerce')
df
A B C D
0 5 1.0 9 23.0
1 0 NaN 3 1.0
2 3 NaN 5 NaN
3 3 50.0 2 268.0
4 7 234.0 4 NaN
Applying pd.to_numeric along the columns (i.e., axis=0, the default) should be slightly faster for long DataFrames.
In [30]: pd.Series([1,2,3,4,'.']).convert_objects(convert_numeric=True)
Out[30]:
0 1
1 2
2 3
3 4
4 NaN
dtype: float64
python - Pandas series to floats - Stack Overflow
TypeError: cannot convert the series to <type 'float'>
python - pandas.core.series.Series to float - Stack Overflow
python - Converting strings to floats in a DataFrame - Stack Overflow
Use DataFrame.replace with a dictionary, then DataFrame.astype to convert to float:
df = df.replace({'%': '', 'N/A': np.NaN}, regex=True).astype(float)
col1 col2
0 2.0 NaN
1 3.0 NaN
2 4.0 4.0
3 5.0 5.0
For your example this should work. Basically cleans the % symbol and then turns any that does not look like a number into NaN. Then it's trivial to turn into float. This is more resilient if other texts were to appear, but you still need to be able to clean the symbols around text
df = pd.DataFrame(data)
for col_name in df.columns:
aux_series = df[col_name].fillna('').str.replace('%', '')
aux_series[-aux_series.str.isnumeric()] = None
aux_series = aux_series.astype(float, errors='ignore')
df[col_name] = aux_series
a['col1'] = round(100.0 * a['col2'] / a['col3'], 1)
TypeError: cannot convert the series to <type 'float'>
Any idea why this is happening?
NOTE:
pd.convert_objectshas now been deprecated. You should usepd.Series.astype(float)orpd.to_numericas described in other answers.
This is available in 0.11. Forces conversion (or set's to nan)
This will work even when astype will fail; its also series by series
so it won't convert say a complete string column
CopyIn [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
You can try df.column_name = df.column_name.astype(float). As for the NaN values, you need to specify how they should be converted, but you can use the .fillna method to do it.
Example:
CopyIn [12]: df
Out[12]:
a b
0 0.1 0.2
1 NaN 0.3
2 0.4 0.5
In [13]: df.a.values
Out[13]: array(['0.1', nan, '0.4'], dtype=object)
In [14]: df.a = df.a.astype(float).fillna(0.0)
In [15]: df
Out[15]:
a b
0 0.1 0.2
1 0.0 0.3
2 0.4 0.5
In [16]: df.a.values
Out[16]: array([ 0.1, 0. , 0.4])
To modify the float output do this:
Copydf= pd.DataFrame(range(5), columns=['a'])
df.a = df.a.astype(float)
df
Out[33]:
a
0 0.0000000
1 1.0000000
2 2.0000000
3 3.0000000
4 4.0000000
pd.options.display.float_format = '{:,.0f}'.format
df
Out[35]:
a
0 0
1 1
2 2
3 3
4 4
Use the pandas.DataFrame.astype(<type>) function to manipulate column dtypes.
Copy>>> df = pd.DataFrame(np.random.rand(3,4), columns=list("ABCD"))
>>> df
A B C D
0 0.542447 0.949988 0.669239 0.879887
1 0.068542 0.757775 0.891903 0.384542
2 0.021274 0.587504 0.180426 0.574300
>>> df[list("ABCD")] = df[list("ABCD")].astype(int)
>>> df
A B C D
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
EDIT:
To handle missing values:
Copy>>> df
A B C D
0 0.475103 0.355453 0.66 0.869336
1 0.260395 0.200287 NaN 0.617024
2 0.517692 0.735613 0.18 0.657106
>>> df[list("ABCD")] = df[list("ABCD")].fillna(0.0).astype(int)
>>> df
A B C D
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0