a = np.array([0.123456789121212,2,3], dtype=np.float16)
print("16bit: ", a[0])
a = np.array([0.123456789121212,2,3], dtype=np.float32)
print("32bit: ", a[0])
b = np.array([0.123456789121212121212,2,3], dtype=np.float64)
print("64bit: ", b[0])
- 16bit: 0.1235
- 32bit: 0.12345679
- 64bit: 0.12345678912121212
a = np.array([0.123456789121212,2,3], dtype=np.float16)
print("16bit: ", a[0])
a = np.array([0.123456789121212,2,3], dtype=np.float32)
print("32bit: ", a[0])
b = np.array([0.123456789121212121212,2,3], dtype=np.float64)
print("64bit: ", b[0])
- 16bit: 0.1235
- 32bit: 0.12345679
- 64bit: 0.12345678912121212
float32 is a 32 bit number - float64 uses 64 bits.
That means that float64’s take up twice as much memory - and doing operations on them may be a lot slower in some machine architectures.
However, float64’s can represent numbers much more accurately than 32 bit floats.
They also allow much larger numbers to be stored.
For your Python-Numpy project I'm sure you know the input variables and their nature.
To make a decision we as programmers need to ask ourselves
- What kind of precision does my output need?
- Is speed not an issue at all?
- what precision is needed in parts per million?
A naive example would be if I store weather data of my city as [12.3, 14.5, 11.1, 9.9, 12.2, 8.2]
Next day Predicted Output could be of 11.5 or 11.5164374
do your think storing float 32 or float 64 would be necessary?
Difference between Python float and numpy float32 - Stack Overflow
variable attributes: float32 vs float64, int32 vs int64
python - Different behavior of float32/float64 numpy variables - Stack Overflow
numpy - How to force python float operation on float32 rather than float64? - Stack Overflow
Python's standard float type is a C double: http://docs.python.org/2/library/stdtypes.html#typesnumeric
NumPy's standard numpy.float is the same, and is also the same as numpy.float64.
Data type-wise numpy floats and built-in Python floats are the same, however boolean operations on numpy floats return np.bool_ objects, which always return False for val is True. Example below:
In [1]: import numpy as np
...: an_np_float = np.float32(0.3)
...: a_normal_float = 0.3
...: print(a_normal_float, an_np_float)
...: print(type(a_normal_float), type(an_np_float))
0.3 0.3
<class 'float'> <class 'numpy.float32'>
Numpy floats can arise from scalar output of array operations. If you weren't checking the data type, it is easy to confuse numpy floats for native floats.
In [2]: criterion_fn = lambda x: x <= 0.5
...: criterion_fn(a_normal_float), criterion_fn(an_np_float)
Out[2]: (True, True)
Even boolean operations look correct. However the result of the numpy float isn't a native boolean datatype, and thus can't be truthy.
In [3]: criterion_fn(a_normal_float) is True, criterion_fn(an_np_float) is True
Out[3]: (True, False)
In [4]: type(criterion_fn(a_normal_float)), type(criterion_fn(an_np_float))
Out[4]: (bool, numpy.bool_)
According to this github thread, criterion_fn(an_np_float) == True will evaluate properly, but that goes against the PEP8 style guide.
Instead, extract the native float from the result of numpy operations. You can do an_np_float.item() to do it explicitly (ref: this SO post) or simply pass values through float().
The printed values are not correct. In your case y is smaller than 1 when using float64 and bigger or equal to 1 when using float32. this is expected since rounding errors depend on the size of the float.
To avoid this kind of problems, when dealing with floating point numbers you should always decide a "minimum error", usually called epsilon and, instead of comparing for equality, checking whether the result is at most distant epsilon from the target value:
In [13]: epsilon = 1e-11
In [14]: number = np.float64(1) - 1e-16
In [15]: target = 1
In [16]: abs(number - target) < epsilon # instead of number == target
Out[16]: True
In particular, numpy already provides np.allclose which can be useful to compare arrays for equality given a certain tolerance. It works even when the arguments aren't arrays(e.g. np.allclose(1 - 1e-16, 1) -> True).
Note however than numpy.set_printoptions doesn't affect how np.float32/64 are printed. It affects only how arrays are printed:
In [1]: import numpy as np
In [2]: np.float(1) - 1e-16
Out[2]: 0.9999999999999999
In [3]: np.array([1 - 1e-16])
Out[3]: array([ 1.])
In [4]: np.set_printoptions(precision=16)
In [5]: np.array([1 - 1e-16])
Out[5]: array([ 0.9999999999999999])
In [6]: np.float(1) - 1e-16
Out[6]: 0.9999999999999999
Also note that doing print y or evaluating y in the interactive interpreter gives different results:
In [1]: import numpy as np
In [2]: np.float(1) - 1e-16
Out[2]: 0.9999999999999999
In [3]: print(np.float64(1) - 1e-16)
1.0
The difference is that print calls str while evaluating calls repr:
In [9]: str(np.float64(1) - 1e-16)
Out[9]: '1.0'
In [10]: repr(np.float64(1) - 1e-16)
Out[10]: '0.99999999999999989'
In [26]: x = numpy.float64("1.000000000000001")
In [27]: print x, repr(x)
1.0 1.0000000000000011
In other words, you are plagued by loss of precision in print statement. The value is very slightly different than 1.
Will numpy.float32 help?
>>>PI=3.1415926535897
>>> print PI*PI
9.86960440109
>>> PI32=numpy.float32(PI)
>>> print PI32*PI32
9.86961
If you want to do math operation on float32, convert the operands to float32 may help you.
Use numpy.ndarray.astype:
import numpy as np
arr_f64 = np.array([1.0000123456789, 2.0000123456789, 3.0000123456789], dtype=np.float64)
arr_f32 = arr_f64.astype(np.float32)
Pay attention to precision:
np.set_printoptions(precision=16)
print("arr_f64 = ", arr_f64)
print("arr_f32 = ", arr_f32)
gives
arr_f64 = [1.0000123456789 2.0000123456789 3.0000123456789]
arr_f32 = [1.0000124000000 2.0000124000000 3.0000124000000]
It appears you're familiar with the problems of binary floating point accuracy, but anybody who isn't should read Is floating point math broken?
Converting from a type with high accuracy to one with lower accuracy involves rounding. The rules for rounding binary floating point are well established, and always try to deliver the closest representable value to the one you started with. It's quite easy for two 64-bit values to round to the same 32-bit value, because the intervals between consecutive values are so much wider.
For your examples you can see that they're ever so slightly different, one is too low and the other is too high. But look at how close they are!
>>> x = np.float64(0.0003)
>>> y = np.float64(0.0001 * 3)
>>> f'{x:.65f}'
'0.00029999999999999997371893933895137251965934410691261291503906250'
>>> f'{y:.65f}'
'0.00030000000000000002792904796322659422003198415040969848632812500'
When you round them to float32 they become identical because of the necessary rounding applied.
>>> x = x.astype(np.float32)
>>> f'{x:.65f}'
'0.00030000001424923539161682128906250000000000000000000000000000000'
>>> y = y.astype(np.float32)
>>> f'{y:.65f}'
'0.00030000001424923539161682128906250000000000000000000000000000000'
When you see the closest alternatives they could have chosen, it's easy to see why they rounded the way they did.
>>> one = np.float32(1.0)
>>> f'{np.nextafter(x, one):.65f}'
'0.00030000004335306584835052490234375000000000000000000000000000000'
>>> f'{np.nextafter(x, -one):.65f}'
'0.00029999998514540493488311767578125000000000000000000000000000000'
Imagine your two friends ran a marathon in actual times 5:04:16 and 4:58:47. You ask them about their times.
Float64-minded friends might tell you "5 hours 4 minutes" and "4 hours 59 minutes". Not exact, but pretty exact. You conclude they're not equally fast.
Float32-minded friends might tell you "5 hours" and "5 hours". Even less exact. You conclude they are equally fast.
>>> numpy.float64(5.9975).hex()
'0x1.7fd70a3d70a3dp+2'
>>> (5.9975).hex()
'0x1.7fd70a3d70a3dp+2'
They are the same number. What differs is the textual representation obtained via by their __repr__ method; the native Python type outputs the minimal digits needed to uniquely distinguish values, while NumPy code before version 1.14.0, released in 2018 didn't try to minimise the number of digits output.
Numpy float64 dtype inherits from Python float, which implements C double internally. You can verify that as follows:
isinstance(np.float64(5.9975), float) # True
So even if their string representation is different, the values they store are the same.
On the other hand, np.float32 implements C float (which has no analog in pure Python) and no numpy int dtype (np.int32, np.int64 etc.) inherits from Python int because in Python 3 int is unbounded:
isinstance(np.float32(5.9975), float) # False
isinstance(np.int32(1), int) # False
So why define np.float64 at all?
np.float64 defines most of the attributes and methods in np.ndarray. From the following code, you can see that np.float64 implements all but 4 methods of np.array:
[m for m in set(dir(np.array([]))) - set(dir(np.float64())) if not m.startswith("_")]
# ['argpartition', 'ctypes', 'partition', 'dot']
So if you have a function that expects to use ndarray methods, you can pass np.float64 to it while float doesn't give you the same.
For example:
def my_cool_function(x):
return x.sum()
my_cool_function(np.array([1.5, 2])) # <--- OK
my_cool_function(np.float64(5.9975)) # <--- OK
my_cool_function(5.9975) # <--- AttributeError
The numbers compare equal because 58682.7578125 can be exactly represented in both 32 and 64 bit floating point. Let's take a close look at the binary representation:
32 bit: 01000111011001010011101011000010
sign : 0
exponent: 10001110
fraction: 11001010011101011000010
64 bit: 0100000011101100101001110101100001000000000000000000000000000000
sign : 0
exponent: 10000001110
fraction: 1100101001110101100001000000000000000000000000000000
They have the same sign, the same exponent, and the same fraction - the extra bits in the 64 bit representation are filled with zeros.
No matter which way they are cast, they will compare equal. If you try a different number such as 58682.7578124 you will see that the representations differ at the binary level; 32 bit looses more precision and they won't compare equal.
(It's also easy to see in the binary representation that a float32 can be upcast to a float64 without any loss of information. That is what numpy is supposed to do before comparing both.)
import numpy as np
a = 58682.7578125
f32 = np.float32(a)
f64 = np.float64(a)
u32 = np.array(a, dtype=np.float32).view(dtype=np.uint32)
u64 = np.array(a, dtype=np.float64).view(dtype=np.uint64)
b32 = bin(u32)[2:]
b32 = '0' * (32-len(b32)) + b32 # add leading 0s
print('32 bit: ', b32)
print('sign : ', b32[0])
print('exponent: ', b32[1:9])
print('fraction: ', b32[9:])
print()
b64 = bin(u64)[2:]
b64 = '0' * (64-len(b64)) + b64 # add leading 0s
print('64 bit: ', b64)
print('sign : ', b64[0])
print('exponent: ', b64[1:12])
print('fraction: ', b64[12:])
The same value is stored internally, only it doesn't show all digits with a print
Try:
print "%0.8f" % float_32
See related Printing numpy.float64 with full precision
Hi
in my toy language, I want to have a single float type "decimal". I am not sure if I should go with f16 or f32 internally.
I assume f32 will take more memory but in today's world is that even relevant.
I also read somewhere that GPUs don't support f32 and I need to have f16 anyway if I want to use any UI libraries.
At this point, I really am not sure what i should go for. I really want to keep a single floating type. My language is not targetted at IoT devices and performance is one of the goals.
Why not go for Float64? This gives you the most accuracy, and in a toy language I doubt it will have any significant performance difference from the other float types.
Also, you shouldn't name your float type "decimal" unless it is an actual decimal float (as opposed to a binary float), since it will just cause confusion.
Calling a binary floating point type "decimal" would certainly be ill advised.