NOTE:
pd.convert_objectshas now been deprecated. You should usepd.Series.astype(float)orpd.to_numericas described in other answers.
This is available in 0.11. Forces conversion (or set's to nan)
This will work even when astype will fail; its also series by series
so it won't convert say a complete string column
In [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
Answer from Jeff on Stack OverflowNOTE:
pd.convert_objectshas now been deprecated. You should usepd.Series.astype(float)orpd.to_numericas described in other answers.
This is available in 0.11. Forces conversion (or set's to nan)
This will work even when astype will fail; its also series by series
so it won't convert say a complete string column
In [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
You can try df.column_name = df.column_name.astype(float). As for the NaN values, you need to specify how they should be converted, but you can use the .fillna method to do it.
Example:
In [12]: df
Out[12]:
a b
0 0.1 0.2
1 NaN 0.3
2 0.4 0.5
In [13]: df.a.values
Out[13]: array(['0.1', nan, '0.4'], dtype=object)
In [14]: df.a = df.a.astype(float).fillna(0.0)
In [15]: df
Out[15]:
a b
0 0.1 0.2
1 0.0 0.3
2 0.4 0.5
In [16]: df.a.values
Out[16]: array([ 0.1, 0. , 0.4])
Videos
Hello Guys,
I have a question regarding DataFrames. I have a line of code, which looks similar to this:
import os
import pandas as pd
import numpy as np
file_paths = ('C:/Users/DR/Documents/Polymer Science/Mitarbeiterpraktika/Forschungsmodul I Elektrochemie/Wasserspaltung/Co15-FTO_calc_WS_LS.txt', 'C:/Users/DR/Documents/Polymer Science/Mitarbeiterpraktika/Forschungsmodul I Elektrochemie/Wasserspaltung/Co16-FTO_calc_WS_LS.txt')
files_infos = pd.DataFrame()
for n, file_path in enumerate(file_paths) :
file_name = os.path.basename(file_path)
file_name = file_name.split(".txt")[0]
files_infos[file_name] = [np.nan] * len(files_infos)
files_infos.at["file_path", file_name] = file_path
If I run this script I get this Error. ValueError: could not convert string to float: 'C:/Users/DR/Documents/Polymer Science/Mitarbeiterpraktika/Forschungsmodul I Elektrochemie/Wasserspaltung/Co16-FTO_calc_WS_LS.txt'
I just don´t understand, why pandas tries to convert my string into a float. I thougt mabye it has something to do with the dtype of the DataFrame, but I couldn´t really find an answer (the dtype is object). What I find really confusing about this Error is, that I did use the same approach in different projects and it didn´t occur before.
Can someone of you mabye explain to me, why this error occurs and what I have to look up to find a solution? Please dont give me a solution to my problem, since I would like to solve it by myself in order to learn it.
Thank you for your help in advance.
I want to do some math on a dataframe but (I think) can't get one column/series into the necessary format. The column contains strings; some are '.123' while others are '0'. When I attempt the math on the column of strings by converting everything to an integer like so:
dfteam1['cloff'] = dfteam1.cloff.astype(int)
I get the following error
ValueError: invalid literal for int() with base 10: '.123'
I think it's b/c .123 isn't an integer but a float, so I change the code like so:
dfteam1['cloff'] = dfteam1.cloff.astype(float)
now I get the following error
ValueError: could not convert string to float:
I think it's b/c 0 isn't a float but an integer? Do I need to change all the 0 values to 0.00 or am I completely off base? All feedback is welcome.
To convert from string to float in pandas (assuming you want to convert Employees and you loaded the data frame with df), you can use:
df['Employees'].apply(lambda x:float(x))
You have not given enough information about your input and expected output. So let us assume that hospital name or anything for that matter which is the input for your model is nan. You would like to remove it from the dataset because extracting features from 'nan' wouldn't make sense. Apart from that, if they are just other peripheral features, then it might be alright. In that case, if you wish to convert them into blank, then use:
df.replace(np.nan,' ', regex=True)`
Else, if you wish to remove that frame, you can check for nan using this.
The best way to deal with types is to specify it when ingesting the file:
pandas.read_csv(file_name, dtype={"Employees": float})
What you do with the missing data in Keras is up to you. You can elaborate further as it actually depends on your plan.