This is conciser:
# select the float columns
df_num = df.select_dtypes(include=[np.float])
# select non-numeric columns
df_num = df.select_dtypes(exclude=[np.number])
Answer from RNA on Stack OverflowVideos
This is conciser:
# select the float columns
df_num = df.select_dtypes(include=[np.float])
# select non-numeric columns
df_num = df.select_dtypes(exclude=[np.number])
You can see what the dtype is for all the columns using the dtypes attribute:
In [11]: df = pd.DataFrame([[1, 'a', 2.]])
In [12]: df
Out[12]:
0 1 2
0 1 a 2
In [13]: df.dtypes
Out[13]:
0 int64
1 object
2 float64
dtype: object
In [14]: df.dtypes == object
Out[14]:
0 False
1 True
2 False
dtype: bool
To access the object columns:
In [15]: df.loc[:, df.dtypes == object]
Out[15]:
1
0 a
I think it's most explicit to use (I'm not sure that inplace would work here):
In [16]: df.loc[:, df.dtypes == object] = df.loc[:, df.dtypes == object].fillna('')
Saying that, I recommend you use NaN for missing data.
I don't use float at all in my program and randomly I'm getting an (easy to fix) bug that an input requires Int and float was provided. Here is some recent code, but this isnt the first time something like this happened. I'm looking for a general reasoning rather than this particular reasoning.
x=df.loc[((df['FROM2'] > 599) & (df['FROM2'] < 700) & (df['y']==True))]
z=pd.concat([z, x])then later in the code...
a= pd.merge(a, z, how = 'outer', indicator = True)
a= a.loc[a['_merge'] == 'left_only'].copy()
a.drop(columns = '_merge', inplace = True)NOTE:
pd.convert_objectshas now been deprecated. You should usepd.Series.astype(float)orpd.to_numericas described in other answers.
This is available in 0.11. Forces conversion (or set's to nan)
This will work even when astype will fail; its also series by series
so it won't convert say a complete string column
In [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))
In [11]: df
Out[11]:
A B
0 1.0 1.0
1 1 foo
In [12]: df.dtypes
Out[12]:
A object
B object
dtype: object
In [13]: df.convert_objects(convert_numeric=True)
Out[13]:
A B
0 1 1
1 1 NaN
In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]:
A float64
B float64
dtype: object
You can try df.column_name = df.column_name.astype(float). As for the NaN values, you need to specify how they should be converted, but you can use the .fillna method to do it.
Example:
In [12]: df
Out[12]:
a b
0 0.1 0.2
1 NaN 0.3
2 0.4 0.5
In [13]: df.a.values
Out[13]: array(['0.1', nan, '0.4'], dtype=object)
In [14]: df.a = df.a.astype(float).fillna(0.0)
In [15]: df
Out[15]:
a b
0 0.1 0.2
1 0.0 0.3
2 0.4 0.5
In [16]: df.a.values
Out[16]: array([ 0.1, 0. , 0.4])