Generally your idea of trying to apply astype to each column is fine.
In [590]: X[:,0].astype(int)
Out[590]: array([1, 2, 3, 4, 5])
But you have to collect the results in a separate list. You can't just put them back in X. That list can then be concatenated.
In [601]: numlist=[]; obj_ind=[]
In [602]: for ind in range(X.shape[1]):
.....: try:
.....: x = X[:,ind].astype(np.float32)
.....: numlist.append(x)
.....: except:
.....: obj_ind.append(ind)
In [603]: numlist
Out[603]: [array([ 3., 4., 5., 6., 7.], dtype=float32)]
In [604]: np.column_stack(numlist)
Out[604]:
array([[ 3.],
[ 4.],
[ 5.],
[ 6.],
[ 7.]], dtype=float32)
In [606]: obj_ind
Out[606]: [1]
X is a numpy array with dtype object:
In [582]: X
Out[582]:
array([[1, 'A'],
[2, 'A'],
[3, 'C'],
[4, 'D'],
[5, 'B']], dtype=object)
You could use the same conversion logic to create a structured array with a mix of int and object fields.
In [616]: ytype=[]
In [617]: for ind in range(X.shape[1]):
try:
x = X[:,ind].astype(np.float32)
ytype.append('i4')
except:
ytype.append('O')
In [618]: ytype
Out[618]: ['i4', 'O']
In [620]: Y=np.zeros(X.shape[0],dtype=','.join(ytype))
In [621]: for i in range(X.shape[1]):
Y[Y.dtype.names[i]] = X[:,i]
In [622]: Y
Out[622]:
array([(3, 'A'), (4, 'A'), (5, 'C'), (6, 'D'), (7, 'B')],
dtype=[('f0', '<i4'), ('f1', 'O')])
Y['f0'] gives the the numeric field.
Generally your idea of trying to apply astype to each column is fine.
In [590]: X[:,0].astype(int)
Out[590]: array([1, 2, 3, 4, 5])
But you have to collect the results in a separate list. You can't just put them back in X. That list can then be concatenated.
In [601]: numlist=[]; obj_ind=[]
In [602]: for ind in range(X.shape[1]):
.....: try:
.....: x = X[:,ind].astype(np.float32)
.....: numlist.append(x)
.....: except:
.....: obj_ind.append(ind)
In [603]: numlist
Out[603]: [array([ 3., 4., 5., 6., 7.], dtype=float32)]
In [604]: np.column_stack(numlist)
Out[604]:
array([[ 3.],
[ 4.],
[ 5.],
[ 6.],
[ 7.]], dtype=float32)
In [606]: obj_ind
Out[606]: [1]
X is a numpy array with dtype object:
In [582]: X
Out[582]:
array([[1, 'A'],
[2, 'A'],
[3, 'C'],
[4, 'D'],
[5, 'B']], dtype=object)
You could use the same conversion logic to create a structured array with a mix of int and object fields.
In [616]: ytype=[]
In [617]: for ind in range(X.shape[1]):
try:
x = X[:,ind].astype(np.float32)
ytype.append('i4')
except:
ytype.append('O')
In [618]: ytype
Out[618]: ['i4', 'O']
In [620]: Y=np.zeros(X.shape[0],dtype=','.join(ytype))
In [621]: for i in range(X.shape[1]):
Y[Y.dtype.names[i]] = X[:,i]
In [622]: Y
Out[622]:
array([(3, 'A'), (4, 'A'), (5, 'C'), (6, 'D'), (7, 'B')],
dtype=[('f0', '<i4'), ('f1', 'O')])
Y['f0'] gives the the numeric field.
I think this might help
def func(x):
a = None
try:
a = x.astype(float)
except:
# x.name represents the current index value
# which is column name in this case
obj.append(x.name)
a = x
return a
obj = []
new_df = df.apply(func, axis=0)
This will keep the object columns as such which you can use later.
Note: While using pandas.DataFrame avoid using iteration using loop as this much slower than performing the same operation using apply.
Your second method is to be preferred, but using np.float32 means casting numbers to single precision. For your matrix, this precision is too low: 115.80200375 becomes 115.80200195 due to truncation. You can set double precition explicitly with numpy.float64, or just pass Python's float type as an argument, which means the same.
Z = numpy.array(matr.tolist(), dtype=float)
or, to keep the matrix structure,
Z = numpy.matrix(matr.tolist(), dtype=float)
You can do that when vectorizing a function (which is what we usually want to do anyway). The following example vectorizes and converts the theta function
import numpy as np
import mpmath as mpm
jtheta3_fn = lambda z,q: mpm.jtheta(n=3,z=z,q=q)
jtheta3_fn = np.vectorize(jtheta3_fn,otypes=(float,))
import numpy
a = numpy.arange(0.5, 1.5, 0.1, dtype=numpy.float64)
print(a)
a = numpy.round(a.astype(numpy.float64), 1)
print (a)
print (a.tolist())
Output:
[0.5 0.6 0.7 0.8 0.9 1. 1.1 1.2 1.3 1.4]
[0.5 0.6 0.7 0.8 0.9 1. 1.1 1.2 1.3 1.4]
[0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4]
You can use numpy.round() with astype() method to round your float, getting something consistant upon tolist() conversion.
By the conversion via .tolist() you're not gaining or loosing any precision. You're just converting to another data type which chooses to represent itself differently. You seem to be thinking that by the conversion it turns the 0.8 into 0.7999999999999999, but the situation is, it has never been 0.8. This is due to limited floating point precision. You can verify yourself:
>>> import numpy
>>> a = numpy.arange(0.5, 1.5, 0.1, dtype=numpy.float64)
>>> a
array([0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3, 1.4])
>>> a[3]
0.7999999999999999
The situation is that that every type can decide how its instances are represented via __repr__ or __str__. In this specific case np.ndarray decides to show a rounded version of its elements for more clarity. This can be controlled via numpy.set_printoptions. This function only affects how arrays are displayed:
>>> numpy.set_printoptions(floatmode='unique')
>>> a
array([0.5 , 0.6 , 0.7 ,
0.7999999999999999, 0.8999999999999999, 0.9999999999999999,
1.0999999999999999, 1.1999999999999997, 1.2999999999999998,
1.4 ])
A note on np.arange from the docs:
When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use
numpy.linspacefor these cases.
Here are the results for linspace:
>>> np.linspace(0.5, 1.5, 11)
array([0.5 , 0.6 , 0.7 ,
0.8 , 0.9 , 1. ,
1.1 , 1.2000000000000002, 1.3 ,
1.4 , 1.5 ])