NOTE:
pd.convert_objectshas now been deprecated. You should usepd.Series.astype(float)orpd.to_numericas described in other answers.
This is available in 0.11. Forces conversion (or set's to nan)
This will work even when astype will fail; its also series by series
so it won't convert say a complete string column
In [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
Answer from Jeff on Stack OverflowNOTE:
pd.convert_objectshas now been deprecated. You should usepd.Series.astype(float)orpd.to_numericas described in other answers.
This is available in 0.11. Forces conversion (or set's to nan)
This will work even when astype will fail; its also series by series
so it won't convert say a complete string column
In [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
You can try df.column_name = df.column_name.astype(float). As for the NaN values, you need to specify how they should be converted, but you can use the .fillna method to do it.
Example:
In [12]: df
Out[12]:
a b
0 0.1 0.2
1 NaN 0.3
2 0.4 0.5
In [13]: df.a.values
Out[13]: array(['0.1', nan, '0.4'], dtype=object)
In [14]: df.a = df.a.astype(float).fillna(0.0)
In [15]: df
Out[15]:
a b
0 0.1 0.2
1 0.0 0.3
2 0.4 0.5
In [16]: df.a.values
Out[16]: array([ 0.1, 0. , 0.4])
Videos
Hello Guys,
I have a question regarding DataFrames. I have a line of code, which looks similar to this:
import os
import pandas as pd
import numpy as np
file_paths = ('C:/Users/DR/Documents/Polymer Science/Mitarbeiterpraktika/Forschungsmodul I Elektrochemie/Wasserspaltung/Co15-FTO_calc_WS_LS.txt', 'C:/Users/DR/Documents/Polymer Science/Mitarbeiterpraktika/Forschungsmodul I Elektrochemie/Wasserspaltung/Co16-FTO_calc_WS_LS.txt')
files_infos = pd.DataFrame()
for n, file_path in enumerate(file_paths) :
file_name = os.path.basename(file_path)
file_name = file_name.split(".txt")[0]
files_infos[file_name] = [np.nan] * len(files_infos)
files_infos.at["file_path", file_name] = file_path
If I run this script I get this Error. ValueError: could not convert string to float: 'C:/Users/DR/Documents/Polymer Science/Mitarbeiterpraktika/Forschungsmodul I Elektrochemie/Wasserspaltung/Co16-FTO_calc_WS_LS.txt'
I just don´t understand, why pandas tries to convert my string into a float. I thougt mabye it has something to do with the dtype of the DataFrame, but I couldn´t really find an answer (the dtype is object). What I find really confusing about this Error is, that I did use the same approach in different projects and it didn´t occur before.
Can someone of you mabye explain to me, why this error occurs and what I have to look up to find a solution? Please dont give me a solution to my problem, since I would like to solve it by myself in order to learn it.
Thank you for your help in advance.
I want to do some math on a dataframe but (I think) can't get one column/series into the necessary format. The column contains strings; some are '.123' while others are '0'. When I attempt the math on the column of strings by converting everything to an integer like so:
dfteam1['cloff'] = dfteam1.cloff.astype(int)
I get the following error
ValueError: invalid literal for int() with base 10: '.123'
I think it's b/c .123 isn't an integer but a float, so I change the code like so:
dfteam1['cloff'] = dfteam1.cloff.astype(float)
now I get the following error
ValueError: could not convert string to float:
I think it's b/c 0 isn't a float but an integer? Do I need to change all the 0 values to 0.00 or am I completely off base? All feedback is welcome.
I've faced the same error while trying to create a heatmap. Following code solved my problem.
hp_train.corr(numeric_only=True)
You can check column dtypes using hp_train.dtypes. Subset the dataframe for only the desired columns before calling corr.
For example if you only want float64 cols
dtype_df = hp_train.dtypes
float_cols = dtype_df.iloc[(dtype_df=='float64').values].index
hp_train[float_cols].corr()