- If the dataframe (say
df) wholly consists offloat64dtypes, you can do:
df = df.astype('float32')
- Only if some columns are
float64, then you'd have to select those columns and change their dtype:
# Select columns with 'float64' dtype
float64_cols = list(df.select_dtypes(include='float64'))
# The same code again calling the columns
df[float64_cols] = df[float64_cols].astype('float32')
Answer from Shiva Govindaswamy on Stack Overflow- If the dataframe (say
df) wholly consists offloat64dtypes, you can do:
df = df.astype('float32')
- Only if some columns are
float64, then you'd have to select those columns and change their dtype:
# Select columns with 'float64' dtype
float64_cols = list(df.select_dtypes(include='float64'))
# The same code again calling the columns
df[float64_cols] = df[float64_cols].astype('float32')
Try this:
df[df.select_dtypes(np.float64).columns] = df.select_dtypes(np.float64).astype(np.float32)
How do I convert a pandas column to float?
How do I convert an object column to float in pandas?
How do I convert multiple columns to float in pandas?
I think this does what you want:
pd.read_csv('Filename.csv').dropna().astype(np.float32)
To keep rows that only have some NaN values, do this:
pd.read_csv('Filename.csv').dropna(how='all').astype(np.float32)
To replace each NaN with a number instead of dropping rows, do this:
pd.read_csv('Filename.csv').fillna(1e6).astype(np.float32)
(I replaced NaN with 1,000,000 just as an example.)
You can also specify the dtype when you read the csv file:
dtype : Type name or dict of column -> type Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
pd.read_csv(my_file, dtype={col: np.float32 for col in ['col_1', 'col_2']})
Example:
df_out = pd.DataFrame(np.random.random([5,5]), columns=list('ABCDE'))
df_out.iat[1,0] = np.nan
df_out.iat[2,1] = np.nan
df_out.to_csv('my_file.csv')
df = pd.read_csv('my_file.csv', dtype={col: np.float32 for col in list('ABCDE')})
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 6 columns):
Unnamed: 0 5 non-null int64
A 4 non-null float32
B 4 non-null float32
C 5 non-null float32
D 5 non-null float32
E 5 non-null float32
dtypes: float32(5), int64(1)
memory usage: 180.0 bytes
>>> df.dropna(axis=0, how='any')
Unnamed: 0 A B C D E
0 0 0.176224 0.943918 0.322430 0.759862 0.028605
3 3 0.723643 0.105813 0.884290 0.589643 0.913065
4 4 0.654378 0.400152 0.763818 0.416423 0.847861
I think it is worth posting this as a GitHub issue. The behavior is certainly inconsistent.
The code takes a different branch based on whether the DataFrame is mixed-type or not (source).
In the mixed-type case the ndarray is converted to a Python list of float64 numbers and then converted back into float64 ndarray disregarding the DataFrame's dtypes information (function maybe_convert_objects()).
In the non-mixed-type case the DataFrame content is updated pretty much directly (source) and the DataFrame keeps its float32 dtypes.
Not an answer, but my recreation of the problem:
In [2]: df = pd.DataFrame([[1, 2, 'a'], [3, 4, 'b']], dtype=np.float32)
In [3]: df.dtypes
Out[3]:
0 float32
1 float32
2 object
dtype: object
In [4]: A=df.ix[:,:1].values
In [5]: A
Out[5]:
array([[ 1., 2.],
[ 3., 4.]], dtype=float32)
In [6]: df.ix[:,:1] = A
In [7]: df.dtypes
Out[7]:
0 float64
1 float64
2 object
dtype: object
In [8]: pd.__version__
Out[8]: '0.15.0'
I'm not as familiar with pandas as numpy, but I'm puzzled as to why ix[:,:1] gives me a 2 column result. In numpy that sort of indexing gives just 1 column.
If I assign a single column dtype does not change
In [47]: df.ix[:,[0]]=A[:,0]
In [48]: df.dtypes
Out[48]:
0 float32
1 float32
2 object
The same actions without mixed datatypes does not change dtypes
In [100]: df1 = pd.DataFrame([[1, 2, 1.23], [3, 4, 3.32]], dtype=np.float32)
In [101]: A1=df1.ix[:,:1].values
In [102]: df1.ix[:,:1]=A1
In [103]: df1.dtypes
Out[103]:
0 float32
1 float32
2 float32
dtype: object
The key must be that with mixed values, the dataframe is, in one sense or other, a dtype=object array, whether that's true of its internal data storage, or just its numpy interface.
In [104]: df1.as_matrix()
Out[104]:
array([[ 1. , 2. , 1.23000002],
[ 3. , 4. , 3.31999993]], dtype=float32)
In [105]: df.as_matrix()
Out[105]:
array([[1.0, 2.0, 'a'],
[3.0, 4.0, 'b']], dtype=object)
0.10.1 doesn't really support float32 very much
see this http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#dtype-specification
you can do this in 0.11 like this:
# dont' use dtype converters explicity for the columns you care about
# they will be converted to float64 if possible, or object if they cannot
df = pd.read_csv('test.csv'.....)
#### this is optional and related to the issue you posted ####
# force anything that is not a numeric to nan
# columns are the list of columns that you are interesetd in
df[columns] = df[columns].convert_objects(convert_numeric=True)
# astype
df[columns] = df[columns].astype('float32')
see http://pandas.pydata.org/pandas-docs/dev/basics.html#object-conversion
Its not as efficient as doing it directly in read_csv (but that requires
some low-level changes)
I have confirmed that with 0.11-dev, this DOES work (on 32-bit and 64-bit, results are the same)
In [5]: x = pd.read_csv(StringIO.StringIO(data), dtype={'a': np.float32}, delim_whitespace=True)
In [6]: x
Out[6]:
a b
0 0.76398 0.81394
1 0.32136 0.91063
In [7]: x.dtypes
Out[7]:
a float32
b float64
dtype: object
In [8]: pd.__version__
Out[8]: '0.11.0.dev-385ff82'
In [9]: quit()
vagrant@precise32:~/pandas$ uname -a
Linux precise32 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686 i686 i386 GNU/Linux
In [22]: df.a.dtype = pd.np.float32
In [23]: df.a.dtype
Out[23]: dtype('float32')
the above works fine for me under pandas 0.10.1
Yes, actually when you use Python's native float to specify the dtype for an array , numpy converts it to float64. As given in documentation -
Note that, above, we use the Python float object as a dtype. NumPy knows that
intrefers tonp.int_,boolmeansnp.bool_, thatfloatisnp.float_andcomplexisnp.complex_. The other data-types do not have Python equivalents.
And -
float_ - Shorthand for float64.
This is why even though you use float to convert the whole array to float , it still uses np.float64.
According to the requirement from the other question , the best solution would be converting to normal float object after taking each scalar value as -
float(new_array[0])
A solution that I could think of is to create a subclass for float and use that for casting (though to me it looks bad). But I would prefer the previous solution over this if possible. Example -
In [20]: import numpy as np
In [21]: na = np.array([1., 2., 3.])
In [22]: na = np.array([1., 2., 3., np.inf, np.inf])
In [23]: type(na[-1])
Out[23]: numpy.float64
In [24]: na[-1] - na[-2]
C:\Anaconda3\Scripts\ipython-script.py:1: RuntimeWarning: invalid value encountered in double_scalars
if __name__ == '__main__':
Out[24]: nan
In [25]: class x(float):
....: pass
....:
In [26]: na_new = na.astype(x)
In [28]: type(na_new[-1])
Out[28]: float #No idea why its showing float, I would have thought it would show '__main__.x' .
In [29]: na_new[-1] - na_new[-2]
Out[29]: nan
In [30]: na_new
Out[30]: array([1.0, 2.0, 3.0, inf, inf], dtype=object)
You can create an anonymous type float like this
>>> new_array = my_array.astype(type('float', (float,), {}))
>>> type(new_array[0])
<type 'float'>
Use numpy.float32:
In [320]:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a':np.random.randn(10)})
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 1 columns):
a 10 non-null float64
dtypes: float64(1)
memory usage: 160.0 bytes
In [323]:
df['a'].astype(np.float32)
Out[323]:
0 0.966618
1 -0.331942
2 0.906349
3 -0.089582
4 -0.722004
5 0.668103
6 0.230314
7 -1.707631
8 1.806862
9 1.783765
Name: a, dtype: float32
You can see that the dtype is now float32
There is now a simpler solution than the accepted answer, without needing to import numpy:
.astype('float32')
Examples:
df['store'] = pd.DataFrame(data).astype('float32')
df['rating'] = (df['rating']/2).astype('float32')