Generally your idea of trying to apply astype to each column is fine.
In [590]: X[:,0].astype(int)
Out[590]: array([1, 2, 3, 4, 5])
But you have to collect the results in a separate list. You can't just put them back in X. That list can then be concatenated.
In [601]: numlist=[]; obj_ind=[]
In [602]: for ind in range(X.shape[1]):
.....: try:
.....: x = X[:,ind].astype(np.float32)
.....: numlist.append(x)
.....: except:
.....: obj_ind.append(ind)
In [603]: numlist
Out[603]: [array([ 3., 4., 5., 6., 7.], dtype=float32)]
In [604]: np.column_stack(numlist)
Out[604]:
array([[ 3.],
[ 4.],
[ 5.],
[ 6.],
[ 7.]], dtype=float32)
In [606]: obj_ind
Out[606]: [1]
X is a numpy array with dtype object:
In [582]: X
Out[582]:
array([[1, 'A'],
[2, 'A'],
[3, 'C'],
[4, 'D'],
[5, 'B']], dtype=object)
You could use the same conversion logic to create a structured array with a mix of int and object fields.
In [616]: ytype=[]
In [617]: for ind in range(X.shape[1]):
try:
x = X[:,ind].astype(np.float32)
ytype.append('i4')
except:
ytype.append('O')
In [618]: ytype
Out[618]: ['i4', 'O']
In [620]: Y=np.zeros(X.shape[0],dtype=','.join(ytype))
In [621]: for i in range(X.shape[1]):
Y[Y.dtype.names[i]] = X[:,i]
In [622]: Y
Out[622]:
array([(3, 'A'), (4, 'A'), (5, 'C'), (6, 'D'), (7, 'B')],
dtype=[('f0', '<i4'), ('f1', 'O')])
Y['f0'] gives the the numeric field.
Generally your idea of trying to apply astype to each column is fine.
In [590]: X[:,0].astype(int)
Out[590]: array([1, 2, 3, 4, 5])
But you have to collect the results in a separate list. You can't just put them back in X. That list can then be concatenated.
In [601]: numlist=[]; obj_ind=[]
In [602]: for ind in range(X.shape[1]):
.....: try:
.....: x = X[:,ind].astype(np.float32)
.....: numlist.append(x)
.....: except:
.....: obj_ind.append(ind)
In [603]: numlist
Out[603]: [array([ 3., 4., 5., 6., 7.], dtype=float32)]
In [604]: np.column_stack(numlist)
Out[604]:
array([[ 3.],
[ 4.],
[ 5.],
[ 6.],
[ 7.]], dtype=float32)
In [606]: obj_ind
Out[606]: [1]
X is a numpy array with dtype object:
In [582]: X
Out[582]:
array([[1, 'A'],
[2, 'A'],
[3, 'C'],
[4, 'D'],
[5, 'B']], dtype=object)
You could use the same conversion logic to create a structured array with a mix of int and object fields.
In [616]: ytype=[]
In [617]: for ind in range(X.shape[1]):
try:
x = X[:,ind].astype(np.float32)
ytype.append('i4')
except:
ytype.append('O')
In [618]: ytype
Out[618]: ['i4', 'O']
In [620]: Y=np.zeros(X.shape[0],dtype=','.join(ytype))
In [621]: for i in range(X.shape[1]):
Y[Y.dtype.names[i]] = X[:,i]
In [622]: Y
Out[622]:
array([(3, 'A'), (4, 'A'), (5, 'C'), (6, 'D'), (7, 'B')],
dtype=[('f0', '<i4'), ('f1', 'O')])
Y['f0'] gives the the numeric field.
I think this might help
def func(x):
a = None
try:
a = x.astype(float)
except:
# x.name represents the current index value
# which is column name in this case
obj.append(x.name)
a = x
return a
obj = []
new_df = df.apply(func, axis=0)
This will keep the object columns as such which you can use later.
Note: While using pandas.DataFrame avoid using iteration using loop as this much slower than performing the same operation using apply.
Yes, actually when you use Python's native float to specify the dtype for an array , numpy converts it to float64. As given in documentation -
Note that, above, we use the Python float object as a dtype. NumPy knows that
intrefers tonp.int_,boolmeansnp.bool_, thatfloatisnp.float_andcomplexisnp.complex_. The other data-types do not have Python equivalents.
And -
float_ - Shorthand for float64.
This is why even though you use float to convert the whole array to float , it still uses np.float64.
According to the requirement from the other question , the best solution would be converting to normal float object after taking each scalar value as -
float(new_array[0])
A solution that I could think of is to create a subclass for float and use that for casting (though to me it looks bad). But I would prefer the previous solution over this if possible. Example -
In [20]: import numpy as np
In [21]: na = np.array([1., 2., 3.])
In [22]: na = np.array([1., 2., 3., np.inf, np.inf])
In [23]: type(na[-1])
Out[23]: numpy.float64
In [24]: na[-1] - na[-2]
C:\Anaconda3\Scripts\ipython-script.py:1: RuntimeWarning: invalid value encountered in double_scalars
if __name__ == '__main__':
Out[24]: nan
In [25]: class x(float):
....: pass
....:
In [26]: na_new = na.astype(x)
In [28]: type(na_new[-1])
Out[28]: float #No idea why its showing float, I would have thought it would show '__main__.x' .
In [29]: na_new[-1] - na_new[-2]
Out[29]: nan
In [30]: na_new
Out[30]: array([1.0, 2.0, 3.0, inf, inf], dtype=object)
You can create an anonymous type float like this
>>> new_array = my_array.astype(type('float', (float,), {}))
>>> type(new_array[0])
<type 'float'>
You may want to use the ndarray.item method, as in a.item(). This is also equivalent to (the now deprecated) np.asscalar(a). This has the benefit of working in situations with views and superfluous axes, while the above solutions will currently break. For example,
>>> a = np.asarray(1).view()
>>> a.item() # correct
1
>>> a[0] # breaks
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: too many indices for array
>>> a = np.asarray([[2]])
>>> a.item() # correct
2
>>> a[0] # bad result
array([2])
This also has the benefit of throwing an exception if the array is not actually a scalar, while the a[0] approach will silently proceed (which may lead to bugs sneaking through undetected).
>>> a = np.asarray([1, 2])
>>> a[0] # silently proceeds
1
>>> a.item() # detects incorrect size
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: can only convert an array of size 1 to a Python scalar
Just access the first item of the list/array, using the index access and the index 0:
>>> list_ = [4]
>>> list_[0]
4
>>> array_ = np.array([4])
>>> array_[0]
4
This will be an int since that was what you inserted in the first place. If you need it to be a float for some reason, you can call float() on it then:
>>> float(list_[0])
4.0
Well, if you're reading the data in as a list, just do np.array(map(float, list_of_strings)) (or equivalently, use a list comprehension). (In Python 3, you'll need to call list on the map return value if you use map, since map returns an iterator now.)
However, if it's already a numpy array of strings, there's a better way. Use astype().
import numpy as np
x = np.array(['1.1', '2.2', '3.3'])
y = x.astype(np.float)
Another option might be numpy.asarray:
import numpy as np
a = ["1.1", "2.2", "3.2"]
b = np.asarray(a, dtype=float)
print(a, type(a), type(a[0]))
print(b, type(b), type(b[0]))
resulting in:
['1.1', '2.2', '3.2'] <class 'list'> <class 'str'>
[1.1 2.2 3.2] <class 'numpy.ndarray'> <class 'numpy.float64'>