a = np.array([0.123456789121212,2,3], dtype=np.float16)
print("16bit: ", a[0])
a = np.array([0.123456789121212,2,3], dtype=np.float32)
print("32bit: ", a[0])
b = np.array([0.123456789121212121212,2,3], dtype=np.float64)
print("64bit: ", b[0])
- 16bit: 0.1235
- 32bit: 0.12345679
- 64bit: 0.12345678912121212
a = np.array([0.123456789121212,2,3], dtype=np.float16)
print("16bit: ", a[0])
a = np.array([0.123456789121212,2,3], dtype=np.float32)
print("32bit: ", a[0])
b = np.array([0.123456789121212121212,2,3], dtype=np.float64)
print("64bit: ", b[0])
- 16bit: 0.1235
- 32bit: 0.12345679
- 64bit: 0.12345678912121212
float32 is a 32 bit number - float64 uses 64 bits.
That means that float64’s take up twice as much memory - and doing operations on them may be a lot slower in some machine architectures.
However, float64’s can represent numbers much more accurately than 32 bit floats.
They also allow much larger numbers to be stored.
For your Python-Numpy project I'm sure you know the input variables and their nature.
To make a decision we as programmers need to ask ourselves
- What kind of precision does my output need?
- Is speed not an issue at all?
- what precision is needed in parts per million?
A naive example would be if I store weather data of my city as [12.3, 14.5, 11.1, 9.9, 12.2, 8.2]
Next day Predicted Output could be of 11.5 or 11.5164374
do your think storing float 32 or float 64 would be necessary?
It appears you're familiar with the problems of binary floating point accuracy, but anybody who isn't should read Is floating point math broken?
Converting from a type with high accuracy to one with lower accuracy involves rounding. The rules for rounding binary floating point are well established, and always try to deliver the closest representable value to the one you started with. It's quite easy for two 64-bit values to round to the same 32-bit value, because the intervals between consecutive values are so much wider.
For your examples you can see that they're ever so slightly different, one is too low and the other is too high. But look at how close they are!
>>> x = np.float64(0.0003)
>>> y = np.float64(0.0001 * 3)
>>> f'{x:.65f}'
'0.00029999999999999997371893933895137251965934410691261291503906250'
>>> f'{y:.65f}'
'0.00030000000000000002792904796322659422003198415040969848632812500'
When you round them to float32 they become identical because of the necessary rounding applied.
>>> x = x.astype(np.float32)
>>> f'{x:.65f}'
'0.00030000001424923539161682128906250000000000000000000000000000000'
>>> y = y.astype(np.float32)
>>> f'{y:.65f}'
'0.00030000001424923539161682128906250000000000000000000000000000000'
When you see the closest alternatives they could have chosen, it's easy to see why they rounded the way they did.
>>> one = np.float32(1.0)
>>> f'{np.nextafter(x, one):.65f}'
'0.00030000004335306584835052490234375000000000000000000000000000000'
>>> f'{np.nextafter(x, -one):.65f}'
'0.00029999998514540493488311767578125000000000000000000000000000000'
Imagine your two friends ran a marathon in actual times 5:04:16 and 4:58:47. You ask them about their times.
Float64-minded friends might tell you "5 hours 4 minutes" and "4 hours 59 minutes". Not exact, but pretty exact. You conclude they're not equally fast.
Float32-minded friends might tell you "5 hours" and "5 hours". Even less exact. You conclude they are equally fast.
Will numpy.float32 help?
>>>PI=3.1415926535897
>>> print PI*PI
9.86960440109
>>> PI32=numpy.float32(PI)
>>> print PI32*PI32
9.86961
If you want to do math operation on float32, convert the operands to float32 may help you.
Use numpy.ndarray.astype:
import numpy as np
arr_f64 = np.array([1.0000123456789, 2.0000123456789, 3.0000123456789], dtype=np.float64)
arr_f32 = arr_f64.astype(np.float32)
Pay attention to precision:
np.set_printoptions(precision=16)
print("arr_f64 = ", arr_f64)
print("arr_f32 = ", arr_f32)
gives
arr_f64 = [1.0000123456789 2.0000123456789 3.0000123456789]
arr_f32 = [1.0000124000000 2.0000124000000 3.0000124000000]
The printed values are not correct. In your case y is smaller than 1 when using float64 and bigger or equal to 1 when using float32. this is expected since rounding errors depend on the size of the float.
To avoid this kind of problems, when dealing with floating point numbers you should always decide a "minimum error", usually called epsilon and, instead of comparing for equality, checking whether the result is at most distant epsilon from the target value:
In [13]: epsilon = 1e-11
In [14]: number = np.float64(1) - 1e-16
In [15]: target = 1
In [16]: abs(number - target) < epsilon # instead of number == target
Out[16]: True
In particular, numpy already provides np.allclose which can be useful to compare arrays for equality given a certain tolerance. It works even when the arguments aren't arrays(e.g. np.allclose(1 - 1e-16, 1) -> True).
Note however than numpy.set_printoptions doesn't affect how np.float32/64 are printed. It affects only how arrays are printed:
In [1]: import numpy as np
In [2]: np.float(1) - 1e-16
Out[2]: 0.9999999999999999
In [3]: np.array([1 - 1e-16])
Out[3]: array([ 1.])
In [4]: np.set_printoptions(precision=16)
In [5]: np.array([1 - 1e-16])
Out[5]: array([ 0.9999999999999999])
In [6]: np.float(1) - 1e-16
Out[6]: 0.9999999999999999
Also note that doing print y or evaluating y in the interactive interpreter gives different results:
In [1]: import numpy as np
In [2]: np.float(1) - 1e-16
Out[2]: 0.9999999999999999
In [3]: print(np.float64(1) - 1e-16)
1.0
The difference is that print calls str while evaluating calls repr:
In [9]: str(np.float64(1) - 1e-16)
Out[9]: '1.0'
In [10]: repr(np.float64(1) - 1e-16)
Out[10]: '0.99999999999999989'
In [26]: x = numpy.float64("1.000000000000001")
In [27]: print x, repr(x)
1.0 1.0000000000000011
In other words, you are plagued by loss of precision in print statement. The value is very slightly different than 1.
>>> numpy.float64(5.9975).hex()
'0x1.7fd70a3d70a3dp+2'
>>> (5.9975).hex()
'0x1.7fd70a3d70a3dp+2'
They are the same number. What differs is the textual representation obtained via by their __repr__ method; the native Python type outputs the minimal digits needed to uniquely distinguish values, while NumPy code before version 1.14.0, released in 2018 didn't try to minimise the number of digits output.
Numpy float64 dtype inherits from Python float, which implements C double internally. You can verify that as follows:
isinstance(np.float64(5.9975), float) # True
So even if their string representation is different, the values they store are the same.
On the other hand, np.float32 implements C float (which has no analog in pure Python) and no numpy int dtype (np.int32, np.int64 etc.) inherits from Python int because in Python 3 int is unbounded:
isinstance(np.float32(5.9975), float) # False
isinstance(np.int32(1), int) # False
So why define np.float64 at all?
np.float64 defines most of the attributes and methods in np.ndarray. From the following code, you can see that np.float64 implements all but 4 methods of np.array:
[m for m in set(dir(np.array([]))) - set(dir(np.float64())) if not m.startswith("_")]
# ['argpartition', 'ctypes', 'partition', 'dot']
So if you have a function that expects to use ndarray methods, you can pass np.float64 to it while float doesn't give you the same.
For example:
def my_cool_function(x):
return x.sum()
my_cool_function(np.array([1.5, 2])) # <--- OK
my_cool_function(np.float64(5.9975)) # <--- OK
my_cool_function(5.9975) # <--- AttributeError
Python's standard float type is a C double: http://docs.python.org/2/library/stdtypes.html#typesnumeric
NumPy's standard numpy.float is the same, and is also the same as numpy.float64.
Data type-wise numpy floats and built-in Python floats are the same, however boolean operations on numpy floats return np.bool_ objects, which always return False for val is True. Example below:
In [1]: import numpy as np
...: an_np_float = np.float32(0.3)
...: a_normal_float = 0.3
...: print(a_normal_float, an_np_float)
...: print(type(a_normal_float), type(an_np_float))
0.3 0.3
<class 'float'> <class 'numpy.float32'>
Numpy floats can arise from scalar output of array operations. If you weren't checking the data type, it is easy to confuse numpy floats for native floats.
In [2]: criterion_fn = lambda x: x <= 0.5
...: criterion_fn(a_normal_float), criterion_fn(an_np_float)
Out[2]: (True, True)
Even boolean operations look correct. However the result of the numpy float isn't a native boolean datatype, and thus can't be truthy.
In [3]: criterion_fn(a_normal_float) is True, criterion_fn(an_np_float) is True
Out[3]: (True, False)
In [4]: type(criterion_fn(a_normal_float)), type(criterion_fn(an_np_float))
Out[4]: (bool, numpy.bool_)
According to this github thread, criterion_fn(an_np_float) == True will evaluate properly, but that goes against the PEP8 style guide.
Instead, extract the native float from the result of numpy operations. You can do an_np_float.item() to do it explicitly (ref: this SO post) or simply pass values through float().