NOTE:
pd.convert_objectshas now been deprecated. You should usepd.Series.astype(float)orpd.to_numericas described in other answers.
This is available in 0.11. Forces conversion (or set's to nan)
This will work even when astype will fail; its also series by series
so it won't convert say a complete string column
CopyIn [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
Answer from Jeff on Stack OverflowNOTE:
pd.convert_objectshas now been deprecated. You should usepd.Series.astype(float)orpd.to_numericas described in other answers.
This is available in 0.11. Forces conversion (or set's to nan)
This will work even when astype will fail; its also series by series
so it won't convert say a complete string column
CopyIn [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
You can try df.column_name = df.column_name.astype(float). As for the NaN values, you need to specify how they should be converted, but you can use the .fillna method to do it.
Example:
CopyIn [12]: df
Out[12]:
a b
0 0.1 0.2
1 NaN 0.3
2 0.4 0.5
In [13]: df.a.values
Out[13]: array(['0.1', nan, '0.4'], dtype=object)
In [14]: df.a = df.a.astype(float).fillna(0.0)
In [15]: df
Out[15]:
a b
0 0.1 0.2
1 0.0 0.3
2 0.4 0.5
In [16]: df.a.values
Out[16]: array([ 0.1, 0. , 0.4])
python - pandas convert strings to float for multiple columns in dataframe - Stack Overflow
python - pandas how to convert all the string value to float - Stack Overflow
python - Convert strings to float in all pandas columns, where this is possible - Stack Overflow
Python pandas - can't convert string to float (I think b/c of multiple data types in column...)
How do I convert a pandas column to float?
How do I convert all columns in a pandas DataFrame to float?
How do I convert an object column to float in pandas?
Videos
Starting in 0.11.1 (coming out this week), replace has a new option to replace with a regex, so this becomes possible
In [14]: df = DataFrame('10.0%',index=range(100),columns=range(10))
In [15]: df.replace('%','',regex=True).astype('float')/100
Out[15]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 10 columns):
0 100 non-null values
1 100 non-null values
2 100 non-null values
3 100 non-null values
4 100 non-null values
5 100 non-null values
6 100 non-null values
7 100 non-null values
8 100 non-null values
9 100 non-null values
dtypes: float64(10)
And a bit faster
In [16]: %timeit df.replace('%','',regex=True).astype('float')/100
1000 loops, best of 3: 1.16 ms per loop
In [18]: %timeit df.applymap(lambda x: float(x[:-1]))/100
1000 loops, best of 3: 1.67 ms per loop
df.applymap(lambda x:float(x.rstrip('%'))/100)
Assuming all values can be correctly converted to float, you can use DataFrame.astype() function to convert the type of complete dataframe to float. Example -
df = df.astype(float)
Demo -
In [5]: df = pd.DataFrame(np.array([['1', '2', '3'], ['4', '5', '6']]))
In [6]: df.astype(float)
Out[6]:
0 1 2
0 1 2 3
1 4 5 6
In [7]: df = df.astype(float)
In [8]: df.dtypes
Out[8]:
0 float64
1 float64
2 float64
dtype: object
.astype() function also has a raise_on_error argument (which defaults to True) which you can set to False to make it ignore errors . In such cases, the original value is used in the DataFrame -
In [10]: df = pd.DataFrame([['1', '2', '3'], ['4', '5', '6'],['blah','bloh','bleh']])
In [11]: df.astype(float,raise_on_error=False)
Out[11]:
0 1 2
0 1 2 3
1 4 5 6
2 blah bloh bleh
To convert just a series/column to float, again assuming all values can be converted, you can use [Series.astype()][2] . Example -
df['somecol'] = df['somecol'].astype(<type>)
Another option is to use df.convert_objects(numeric=True). It attempts to
convert numeric strings to numbers, with unconvertible values becoming NaN:
import pandas as pd
df = pd.DataFrame([['1', '2', '3'], ['4', '5', 'foo'], ['bar', 'baz', 'quux']])
df = df.convert_objects(convert_numeric=True)
print(df)
yields
0 1 2
0 1 2 3
1 4 5 NaN
2 NaN NaN NaN
In contrast, df.astype(float) would raise ValueError: could not convert string to float: quux since in the above DataFrame some strings (such as 'quux') is not numeric.
Note: in future versions of pandas (after 0.16.2) the function argument will be numeric=True instead of convert_numeric=True.
I want to do some math on a dataframe but (I think) can't get one column/series into the necessary format. The column contains strings; some are '.123' while others are '0'. When I attempt the math on the column of strings by converting everything to an integer like so:
dfteam1['cloff'] = dfteam1.cloff.astype(int)
I get the following error
ValueError: invalid literal for int() with base 10: '.123'
I think it's b/c .123 isn't an integer but a float, so I change the code like so:
dfteam1['cloff'] = dfteam1.cloff.astype(float)
now I get the following error
ValueError: could not convert string to float:
I think it's b/c 0 isn't a float but an integer? Do I need to change all the 0 values to 0.00 or am I completely off base? All feedback is welcome.
I don't know if I totally understood your question but you can try
dataframe4['column2'] = dataframe4['column1'].apply(lambda x : float(x))
Edit : If there are some numbers with commas, you can try:
dataframe4['column2'] = dataframe4['column1'].apply(lambda x : float(x.replace(",","")))
The problem appears to be that you have commas in your floats, e.g. '9,826.000'
You can fix like below
import re
re.sub(r",", "", "1,1000.20")
# returns '11000.20' and the below works
float(re.sub(r",", "", "1,1000.20"))
# you can e.g. use apply to apply to all your numbers in the DataFrame
df["new_col"] = df["old_col"].apply(lambda x: float(re.sub(r",", "", x)))
To still show the resulting float with commas afterwards in pandas, you can change the display setting for float as described here
IDK how you want to output these, but e.g. in the to_excel function, you can specify a float format, cf here or re-format the column before output, similar to the above. See this answer for some ideas.
I have a pandas dataframe, populated using pandas.readcsv. Most of the values in the csv file are numeric, but there is one column of the csv, 'Parameter 3', which contains a couple of string values. Because of these 2 string values, every other value in this column is being interpreted as a string too.
I can deal with this using the following code, but it is not very elegant;
for i in df_distns.index:
try:
df_distns.loc[i, 'Parameter 3'] = float(df_distns.loc[i, 'Parameter 3'])
except:
pass
Can anyone recommend a nicer way to do this (without looping)?
UPDATE: you don't need to convert your values afterwards, you can do it on-the-fly when reading your CSV:
In [165]: df=pd.read_csv(url, index_col=0, na_values=['(NA)']).fillna(0)
In [166]: df.dtypes
Out[166]:
GeoName object
ComponentName object
IndustryId int64
IndustryClassification object
Description object
2004 int64
2005 int64
2006 int64
2007 int64
2008 int64
2009 int64
2010 int64
2011 int64
2012 int64
2013 int64
2014 float64
dtype: object
If you need to convert multiple columns to numeric dtypes - use the following technique:
Sample source DF:
In [271]: df
Out[271]:
id a b c d e f
0 id_3 AAA 6 3 5 8 1
1 id_9 3 7 5 7 3 BBB
2 id_7 4 2 3 5 4 2
3 id_0 7 3 5 7 9 4
4 id_0 2 4 6 4 0 2
In [272]: df.dtypes
Out[272]:
id object
a object
b int64
c int64
d int64
e int64
f object
dtype: object
Converting selected columns to numeric dtypes:
In [273]: cols = df.columns.drop('id')
In [274]: df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
In [275]: df
Out[275]:
id a b c d e f
0 id_3 NaN 6 3 5 8 1.0
1 id_9 3.0 7 5 7 3 NaN
2 id_7 4.0 2 3 5 4 2.0
3 id_0 7.0 3 5 7 9 4.0
4 id_0 2.0 4 6 4 0 2.0
In [276]: df.dtypes
Out[276]:
id object
a float64
b int64
c int64
d int64
e int64
f float64
dtype: object
PS if you want to select all string (object) columns use the following simple trick:
cols = df.columns[df.dtypes.eq('object')]
another way is using apply, one liner:
cols = ['col1', 'col2', 'col3']
data[cols] = data[cols].apply(pd.to_numeric, errors='coerce', axis=1)