Well, if you're reading the data in as a list, just do np.array(map(float, list_of_strings)) (or equivalently, use a list comprehension). (In Python 3, you'll need to call list on the map return value if you use map, since map returns an iterator now.)
However, if it's already a numpy array of strings, there's a better way. Use astype().
import numpy as np
x = np.array(['1.1', '2.2', '3.3'])
y = x.astype(np.float)
Answer from Joe Kington on Stack Overflowpython - How to convert an array of strings to an array of floats in numpy? - Stack Overflow
How to cast binary array to float array?
Is there a way to define a float array in Python? - Stack Overflow
Convert list or numpy array of single element to float in python - Stack Overflow
Videos
Well, if you're reading the data in as a list, just do np.array(map(float, list_of_strings)) (or equivalently, use a list comprehension). (In Python 3, you'll need to call list on the map return value if you use map, since map returns an iterator now.)
However, if it's already a numpy array of strings, there's a better way. Use astype().
import numpy as np
x = np.array(['1.1', '2.2', '3.3'])
y = x.astype(np.float)
Another option might be numpy.asarray:
import numpy as np
a = ["1.1", "2.2", "3.2"]
b = np.asarray(a, dtype=float)
print(a, type(a), type(a[0]))
print(b, type(b), type(b[0]))
resulting in:
['1.1', '2.2', '3.2'] <class 'list'> <class 'str'>
[1.1 2.2 3.2] <class 'numpy.ndarray'> <class 'numpy.float64'>
When you are doing physics simulations you should definitely use floats for everything. 0 is an integer constant in Python, and thus np.tile creates integer arrays; use 0.0 as the argument to np.tile to do floating point arrays; or preferably use the np.zeros(N) instead:
You can check the datatype of any array variable from its dtype member:
>>> np.tile(0, 10).dtype
dtype('int64')
>>> np.tile(0.0, 10).dtype
dtype('float64')
>>> np.zeros(10)
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
>>> np.zeros(10).dtype
dtype('float64')
To get a zeroed array of float32 you'd need to give a float32 as the argument:
>>> np.tile(np.float32(0), 10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
or, preferably, use zeros with a defined dtype:
>>> np.zeros(10, dtype='float32')
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
You need x = np.zeros(N), etc.: this declares the arrays as float arrays.
This is the standard way of putting zeros in an array (np.tile() is convenient for creating a tiling with a fixed array).
You may want to use the ndarray.item method, as in a.item(). This is also equivalent to (the now deprecated) np.asscalar(a). This has the benefit of working in situations with views and superfluous axes, while the above solutions will currently break. For example,
>>> a = np.asarray(1).view()
>>> a.item() # correct
1
>>> a[0] # breaks
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: too many indices for array
>>> a = np.asarray([[2]])
>>> a.item() # correct
2
>>> a[0] # bad result
array([2])
This also has the benefit of throwing an exception if the array is not actually a scalar, while the a[0] approach will silently proceed (which may lead to bugs sneaking through undetected).
>>> a = np.asarray([1, 2])
>>> a[0] # silently proceeds
1
>>> a.item() # detects incorrect size
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: can only convert an array of size 1 to a Python scalar
Just access the first item of the list/array, using the index access and the index 0:
>>> list_ = [4]
>>> list_[0]
4
>>> array_ = np.array([4])
>>> array_[0]
4
This will be an int since that was what you inserted in the first place. If you need it to be a float for some reason, you can call float() on it then:
>>> float(list_[0])
4.0
Someone tell me wtf is going on with my output :(
Why doesn't it just show 1.1 in the array
What is the difference between float and double in simplest terms possible. Can you give an example to distinguish between them?
import array
a1 = array.array('f', [1.1])
a2 = array.array('d', [1.1])
print(a1)
print(a2)Output:
array('f', [1.100000023841858])
array('d', [1.1])Generally your idea of trying to apply astype to each column is fine.
In [590]: X[:,0].astype(int)
Out[590]: array([1, 2, 3, 4, 5])
But you have to collect the results in a separate list. You can't just put them back in X. That list can then be concatenated.
In [601]: numlist=[]; obj_ind=[]
In [602]: for ind in range(X.shape[1]):
.....: try:
.....: x = X[:,ind].astype(np.float32)
.....: numlist.append(x)
.....: except:
.....: obj_ind.append(ind)
In [603]: numlist
Out[603]: [array([ 3., 4., 5., 6., 7.], dtype=float32)]
In [604]: np.column_stack(numlist)
Out[604]:
array([[ 3.],
[ 4.],
[ 5.],
[ 6.],
[ 7.]], dtype=float32)
In [606]: obj_ind
Out[606]: [1]
X is a numpy array with dtype object:
In [582]: X
Out[582]:
array([[1, 'A'],
[2, 'A'],
[3, 'C'],
[4, 'D'],
[5, 'B']], dtype=object)
You could use the same conversion logic to create a structured array with a mix of int and object fields.
In [616]: ytype=[]
In [617]: for ind in range(X.shape[1]):
try:
x = X[:,ind].astype(np.float32)
ytype.append('i4')
except:
ytype.append('O')
In [618]: ytype
Out[618]: ['i4', 'O']
In [620]: Y=np.zeros(X.shape[0],dtype=','.join(ytype))
In [621]: for i in range(X.shape[1]):
Y[Y.dtype.names[i]] = X[:,i]
In [622]: Y
Out[622]:
array([(3, 'A'), (4, 'A'), (5, 'C'), (6, 'D'), (7, 'B')],
dtype=[('f0', '<i4'), ('f1', 'O')])
Y['f0'] gives the the numeric field.
I think this might help
def func(x):
a = None
try:
a = x.astype(float)
except:
# x.name represents the current index value
# which is column name in this case
obj.append(x.name)
a = x
return a
obj = []
new_df = df.apply(func, axis=0)
This will keep the object columns as such which you can use later.
Note: While using pandas.DataFrame avoid using iteration using loop as this much slower than performing the same operation using apply.
Nasty little problem... I have been fooling around with this toy example:
>>> arr = np.array([['one', [1, 2, 3]],['two', [4, 5, 6]]], dtype=np.object)
>>> arr
array([['one', [1, 2, 3]],
['two', [4, 5, 6]]], dtype=object)
My first guess was:
>>> np.array(arr[:, 1])
array([[1, 2, 3], [4, 5, 6]], dtype=object)
But that keeps the object dtype, so perhaps then:
>>> np.array(arr[:, 1], dtype=float)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.
You can normally work around this doing the following:
>>> np.array(arr[:, 1], dtype=[('', float)]*3).view(float).reshape(-1, 3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected a readable buffer object
Not here though, which was kind of puzzling. Apparently it is the fact that the objects in your array are lists that throws this off, as replacing the lists with tuples works:
>>> np.array([tuple(j) for j in arr[:, 1]],
... dtype=[('', float)]*3).view(float).reshape(-1, 3)
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
Since there doesn't seem to be any entirely satisfactory solution, the easiest is probably to go with:
>>> np.array(list(arr[:, 1]), dtype=float)
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
Although that will not be very efficient, probably better to go with something like:
>>> np.fromiter((tuple(j) for j in arr[:, 1]), dtype=[('', float)]*3,
... count=len(arr)).view(float).reshape(-1, 3)
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
Based on Jaime's toy example I think you can do this very simply using np.vstack():
arr = np.array([['one', [1, 2, 3]],['two', [4, 5, 6]]], dtype=np.object)
float_arr = np.vstack(arr[:, 1]).astype(np.float)
This will work regardless of whether the 'numeric' elements in your object array are 1D numpy arrays, lists or tuples.