The problem is that you do not do any type conversion of the numpy array. You calculate a float32 variable and put it as an entry into a float64 numpy array. numpy then converts it properly back to float64
Try someting like this:
a = np.zeros(4,dtype="float64")
print a.dtype
print type(a[0])
a = np.float32(a)
print a.dtype
print type(a[0])
The output (tested with python 2.7)
float64
<type 'numpy.float64'>
float32
<type 'numpy.float32'>
a is in your case the array tree.tree_.threshold
Answer from Glostas on Stack OverflowThe problem is that you do not do any type conversion of the numpy array. You calculate a float32 variable and put it as an entry into a float64 numpy array. numpy then converts it properly back to float64
Try someting like this:
a = np.zeros(4,dtype="float64")
print a.dtype
print type(a[0])
a = np.float32(a)
print a.dtype
print type(a[0])
The output (tested with python 2.7)
float64
<type 'numpy.float64'>
float32
<type 'numpy.float32'>
a is in your case the array tree.tree_.threshold
Actually i tried hard but not able to do as the 'sklearn.tree._tree.Tree' objects is not writable.
It is causing a precision issue while generating a PMML file, so i raised a bug over there and they gave an updated solution for it by not converting it in to the Float64 internally.
For more info, you can follow this link: Precision Issue
How to convert np.float32 to Python float easily?
Converting float64 to float32 rounds off data in hdu object
Convert list of numpy.float64 to float in Python quickly - Stack Overflow
Type conversion from float64 to float32 (cpu) sometimes crashes
The tolist() method should do what you want. If you have a numpy array, just call tolist():
In [17]: a
Out[17]:
array([ 0. , 0.14285714, 0.28571429, 0.42857143, 0.57142857,
0.71428571, 0.85714286, 1. , 1.14285714, 1.28571429,
1.42857143, 1.57142857, 1.71428571, 1.85714286, 2. ])
In [18]: a.dtype
Out[18]: dtype('float64')
In [19]: b = a.tolist()
In [20]: b
Out[20]:
[0.0,
0.14285714285714285,
0.2857142857142857,
0.42857142857142855,
0.5714285714285714,
0.7142857142857142,
0.8571428571428571,
1.0,
1.1428571428571428,
1.2857142857142856,
1.4285714285714284,
1.5714285714285714,
1.7142857142857142,
1.857142857142857,
2.0]
In [21]: type(b)
Out[21]: list
In [22]: type(b[0])
Out[22]: float
If, in fact, you really have python list of numpy.float64 objects, then @Alexander's answer is great, or you could convert the list to an array and then use the tolist() method. E.g.
In [46]: c
Out[46]:
[0.0,
0.33333333333333331,
0.66666666666666663,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]
In [47]: type(c)
Out[47]: list
In [48]: type(c[0])
Out[48]: numpy.float64
@Alexander's suggestion, a list comprehension:
In [49]: [float(v) for v in c]
Out[49]:
[0.0,
0.3333333333333333,
0.6666666666666666,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]
Or, convert to an array and then use the tolist() method.
In [50]: np.array(c).tolist()
Out[50]:
[0.0,
0.3333333333333333,
0.6666666666666666,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]
If you are concerned with the speed, here's a comparison. The input, x, is a python list of numpy.float64 objects:
In [8]: type(x)
Out[8]: list
In [9]: len(x)
Out[9]: 1000
In [10]: type(x[0])
Out[10]: numpy.float64
Timing for the list comprehension:
In [11]: %timeit list1 = [float(v) for v in x]
10000 loops, best of 3: 109 µs per loop
Timing for conversion to numpy array and then tolist():
In [12]: %timeit list2 = np.array(x).tolist()
10000 loops, best of 3: 70.5 µs per loop
So it is faster to convert the list to an array and then call tolist().
You could use a list comprehension:
floats = [float(np_float) for np_float in np_float_list]
- If the dataframe (say
df) wholly consists offloat64dtypes, you can do:
df = df.astype('float32')
- Only if some columns are
float64, then you'd have to select those columns and change their dtype:
# Select columns with 'float64' dtype
float64_cols = list(df.select_dtypes(include='float64'))
# The same code again calling the columns
df[float64_cols] = df[float64_cols].astype('float32')
Try this:
df[df.select_dtypes(np.float64).columns] = df.select_dtypes(np.float64).astype(np.float32)
Will numpy.float32 help?
>>>PI=3.1415926535897
>>> print PI*PI
9.86960440109
>>> PI32=numpy.float32(PI)
>>> print PI32*PI32
9.86961
If you want to do math operation on float32, convert the operands to float32 may help you.
Use numpy.ndarray.astype:
import numpy as np
arr_f64 = np.array([1.0000123456789, 2.0000123456789, 3.0000123456789], dtype=np.float64)
arr_f32 = arr_f64.astype(np.float32)
Pay attention to precision:
np.set_printoptions(precision=16)
print("arr_f64 = ", arr_f64)
print("arr_f32 = ", arr_f32)
gives
arr_f64 = [1.0000123456789 2.0000123456789 3.0000123456789]
arr_f32 = [1.0000124000000 2.0000124000000 3.0000124000000]
I was trying numpy for matrix calculations and I used it to solve simultaneous eqns.
I have a matrix, `answer` with desired values shown as:
[[4.] [2.] [5.]]
I realise that the dots are because the actual values are a float, so I tried `answer.tolist()`
This gives me:
[[3.9999999999999987], [1.9999999999999996], [4.999999999999998]]
In my program, I want to convert this to an integer, however using Python's `int()` function means it becomes: 3, 1, 4
I also tried using `.astype()` to convert to an int:
answer.astype(int,casting='same_kind'))
but I get:
TypeError: Cannot cast array data from dtype('float64') to dtype('int32') according to the rule 'same_kind'I am sure importing the ceiling/floor function from `math` would solve this, but I am aware that some results may end up being normal decimals of maybe 4dp, rather than .999999 or .111111 recurring, so rounding isn't the best option
Any advice on the best way of converting?