NOTE:
pd.convert_objectshas now been deprecated. You should usepd.Series.astype(float)orpd.to_numericas described in other answers.
This is available in 0.11. Forces conversion (or set's to nan)
This will work even when astype will fail; its also series by series
so it won't convert say a complete string column
CopyIn [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
Answer from Jeff on Stack OverflowNOTE:
pd.convert_objectshas now been deprecated. You should usepd.Series.astype(float)orpd.to_numericas described in other answers.
This is available in 0.11. Forces conversion (or set's to nan)
This will work even when astype will fail; its also series by series
so it won't convert say a complete string column
CopyIn [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
You can try df.column_name = df.column_name.astype(float). As for the NaN values, you need to specify how they should be converted, but you can use the .fillna method to do it.
Example:
CopyIn [12]: df
Out[12]:
a b
0 0.1 0.2
1 NaN 0.3
2 0.4 0.5
In [13]: df.a.values
Out[13]: array(['0.1', nan, '0.4'], dtype=object)
In [14]: df.a = df.a.astype(float).fillna(0.0)
In [15]: df
Out[15]:
a b
0 0.1 0.2
1 0.0 0.3
2 0.4 0.5
In [16]: df.a.values
Out[16]: array([ 0.1, 0. , 0.4])
python - pandas convert strings to float for multiple columns in dataframe - Stack Overflow
How can I parse values to float in a pandas dataframe column that contains both floats and strings?
python - pandas how to convert all the string value to float - Stack Overflow
Pandas DataFrame tries to convert a string into a float, while adding it to a column
How do I convert a pandas column to float?
How do I convert all columns in a pandas DataFrame to float?
How do I convert an object column to float in pandas?
Videos
Starting in 0.11.1 (coming out this week), replace has a new option to replace with a regex, so this becomes possible
In [14]: df = DataFrame('10.0%',index=range(100),columns=range(10))
In [15]: df.replace('%','',regex=True).astype('float')/100
Out[15]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 10 columns):
0 100 non-null values
1 100 non-null values
2 100 non-null values
3 100 non-null values
4 100 non-null values
5 100 non-null values
6 100 non-null values
7 100 non-null values
8 100 non-null values
9 100 non-null values
dtypes: float64(10)
And a bit faster
In [16]: %timeit df.replace('%','',regex=True).astype('float')/100
1000 loops, best of 3: 1.16 ms per loop
In [18]: %timeit df.applymap(lambda x: float(x[:-1]))/100
1000 loops, best of 3: 1.67 ms per loop
df.applymap(lambda x:float(x.rstrip('%'))/100)
I have a pandas dataframe, populated using pandas.readcsv. Most of the values in the csv file are numeric, but there is one column of the csv, 'Parameter 3', which contains a couple of string values. Because of these 2 string values, every other value in this column is being interpreted as a string too.
I can deal with this using the following code, but it is not very elegant;
for i in df_distns.index:
try:
df_distns.loc[i, 'Parameter 3'] = float(df_distns.loc[i, 'Parameter 3'])
except:
pass
Can anyone recommend a nicer way to do this (without looping)?
Assuming all values can be correctly converted to float, you can use DataFrame.astype() function to convert the type of complete dataframe to float. Example -
df = df.astype(float)
Demo -
In [5]: df = pd.DataFrame(np.array([['1', '2', '3'], ['4', '5', '6']]))
In [6]: df.astype(float)
Out[6]:
0 1 2
0 1 2 3
1 4 5 6
In [7]: df = df.astype(float)
In [8]: df.dtypes
Out[8]:
0 float64
1 float64
2 float64
dtype: object
.astype() function also has a raise_on_error argument (which defaults to True) which you can set to False to make it ignore errors . In such cases, the original value is used in the DataFrame -
In [10]: df = pd.DataFrame([['1', '2', '3'], ['4', '5', '6'],['blah','bloh','bleh']])
In [11]: df.astype(float,raise_on_error=False)
Out[11]:
0 1 2
0 1 2 3
1 4 5 6
2 blah bloh bleh
To convert just a series/column to float, again assuming all values can be converted, you can use [Series.astype()][2] . Example -
df['somecol'] = df['somecol'].astype(<type>)
Another option is to use df.convert_objects(numeric=True). It attempts to
convert numeric strings to numbers, with unconvertible values becoming NaN:
import pandas as pd
df = pd.DataFrame([['1', '2', '3'], ['4', '5', 'foo'], ['bar', 'baz', 'quux']])
df = df.convert_objects(convert_numeric=True)
print(df)
yields
0 1 2
0 1 2 3
1 4 5 NaN
2 NaN NaN NaN
In contrast, df.astype(float) would raise ValueError: could not convert string to float: quux since in the above DataFrame some strings (such as 'quux') is not numeric.
Note: in future versions of pandas (after 0.16.2) the function argument will be numeric=True instead of convert_numeric=True.
Hello Guys,
I have a question regarding DataFrames. I have a line of code, which looks similar to this:
import os
import pandas as pd
import numpy as np
file_paths = ('C:/Users/DR/Documents/Polymer Science/Mitarbeiterpraktika/Forschungsmodul I Elektrochemie/Wasserspaltung/Co15-FTO_calc_WS_LS.txt', 'C:/Users/DR/Documents/Polymer Science/Mitarbeiterpraktika/Forschungsmodul I Elektrochemie/Wasserspaltung/Co16-FTO_calc_WS_LS.txt')
files_infos = pd.DataFrame()
for n, file_path in enumerate(file_paths) :
file_name = os.path.basename(file_path)
file_name = file_name.split(".txt")[0]
files_infos[file_name] = [np.nan] * len(files_infos)
files_infos.at["file_path", file_name] = file_path
If I run this script I get this Error. ValueError: could not convert string to float: 'C:/Users/DR/Documents/Polymer Science/Mitarbeiterpraktika/Forschungsmodul I Elektrochemie/Wasserspaltung/Co16-FTO_calc_WS_LS.txt'
I just don´t understand, why pandas tries to convert my string into a float. I thougt mabye it has something to do with the dtype of the DataFrame, but I couldn´t really find an answer (the dtype is object). What I find really confusing about this Error is, that I did use the same approach in different projects and it didn´t occur before.
Can someone of you mabye explain to me, why this error occurs and what I have to look up to find a solution? Please dont give me a solution to my problem, since I would like to solve it by myself in order to learn it.
Thank you for your help in advance.
I don't know if I totally understood your question but you can try
dataframe4['column2'] = dataframe4['column1'].apply(lambda x : float(x))
Edit : If there are some numbers with commas, you can try:
dataframe4['column2'] = dataframe4['column1'].apply(lambda x : float(x.replace(",","")))
The problem appears to be that you have commas in your floats, e.g. '9,826.000'
You can fix like below
import re
re.sub(r",", "", "1,1000.20")
# returns '11000.20' and the below works
float(re.sub(r",", "", "1,1000.20"))
# you can e.g. use apply to apply to all your numbers in the DataFrame
df["new_col"] = df["old_col"].apply(lambda x: float(re.sub(r",", "", x)))
To still show the resulting float with commas afterwards in pandas, you can change the display setting for float as described here
IDK how you want to output these, but e.g. in the to_excel function, you can specify a float format, cf here or re-format the column before output, similar to the above. See this answer for some ideas.
I want to do some math on a dataframe but (I think) can't get one column/series into the necessary format. The column contains strings; some are '.123' while others are '0'. When I attempt the math on the column of strings by converting everything to an integer like so:
dfteam1['cloff'] = dfteam1.cloff.astype(int)
I get the following error
ValueError: invalid literal for int() with base 10: '.123'
I think it's b/c .123 isn't an integer but a float, so I change the code like so:
dfteam1['cloff'] = dfteam1.cloff.astype(float)
now I get the following error
ValueError: could not convert string to float:
I think it's b/c 0 isn't a float but an integer? Do I need to change all the 0 values to 0.00 or am I completely off base? All feedback is welcome.