Surprisingly, it's totally OK to use 16 bits, even not just for fun, but in production as well. For example, in this video Jeff Dean talks about 16-bit calculations at Google, around 52:00. A quote from the slides:
Neural net training very tolerant of reduced precision
Since GPU memory is the main bottleneck in ML computation, there has been a lot of research on precision reduction. E.g.
Gupta at al paper "Deep Learning with Limited Numerical Precision" about fixed (not floating) 16-bit training but with stochastic rounding.
Courbariaux at al "Training Deep Neural Networks with Low Precision Multiplications" about 10-bit activations and 12-bit parameter updates.
And this is not the limit. Courbariaux et al, "BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1". Here they discuss 1-bit activations and weights (though higher precision for the gradients), which makes the forward pass super fast.
Of course, I can imagine some networks may require high precision for training, but I would recommend at least to try 16 bits for training a big network and switch to 32 bits if it proves to work worse.
Answer from Maxim on Stack OverflowHi
in my toy language, I want to have a single float type "decimal". I am not sure if I should go with f16 or f32 internally.
I assume f32 will take more memory but in today's world is that even relevant.
I also read somewhere that GPUs don't support f32 and I need to have f16 anyway if I want to use any UI libraries.
At this point, I really am not sure what i should go for. I really want to keep a single floating type. My language is not targetted at IoT devices and performance is one of the goals.
Why not go for Float64? This gives you the most accuracy, and in a toy language I doubt it will have any significant performance difference from the other float types.
Also, you shouldn't name your float type "decimal" unless it is an actual decimal float (as opposed to a binary float), since it will just cause confusion.
Calling a binary floating point type "decimal" would certainly be ill advised.
Surprisingly, it's totally OK to use 16 bits, even not just for fun, but in production as well. For example, in this video Jeff Dean talks about 16-bit calculations at Google, around 52:00. A quote from the slides:
Neural net training very tolerant of reduced precision
Since GPU memory is the main bottleneck in ML computation, there has been a lot of research on precision reduction. E.g.
Gupta at al paper "Deep Learning with Limited Numerical Precision" about fixed (not floating) 16-bit training but with stochastic rounding.
Courbariaux at al "Training Deep Neural Networks with Low Precision Multiplications" about 10-bit activations and 12-bit parameter updates.
And this is not the limit. Courbariaux et al, "BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1". Here they discuss 1-bit activations and weights (though higher precision for the gradients), which makes the forward pass super fast.
Of course, I can imagine some networks may require high precision for training, but I would recommend at least to try 16 bits for training a big network and switch to 32 bits if it proves to work worse.
float16 training is tricky: your model might not converge when using standard float16, but float16 does save memory, and is also faster if you are using the latest Volta GPUs. Nvidia recommends "Mixed Precision Training" in the latest doc and paper.
To better use float16, you need to manually and carefully choose the loss_scale. If loss_scale is too large, you may get NANs and INFs; if loss_scale is too small, the model might not converge. Unfortunately, there is no common loss_scale for all models, so you have to choose it carefully for your specific model.
If you just want to reduce the memory usage, you could also try tf. to_bfloat16, which might converge better.
No performance difference between Float16 and Float32 optimized TensorRT models
Why to keep parameters in float32, why not in (b)float16?
Massive performance penalty for Float16 compared to Float32
python - The real difference between float32 and float64 - Stack Overflow
Videos
a = np.array([0.123456789121212,2,3], dtype=np.float16)
print("16bit: ", a[0])
a = np.array([0.123456789121212,2,3], dtype=np.float32)
print("32bit: ", a[0])
b = np.array([0.123456789121212121212,2,3], dtype=np.float64)
print("64bit: ", b[0])
- 16bit: 0.1235
- 32bit: 0.12345679
- 64bit: 0.12345678912121212
float32 is a 32 bit number - float64 uses 64 bits.
That means that float64’s take up twice as much memory - and doing operations on them may be a lot slower in some machine architectures.
However, float64’s can represent numbers much more accurately than 32 bit floats.
They also allow much larger numbers to be stored.
For your Python-Numpy project I'm sure you know the input variables and their nature.
To make a decision we as programmers need to ask ourselves
- What kind of precision does my output need?
- Is speed not an issue at all?
- what precision is needed in parts per million?
A naive example would be if I store weather data of my city as [12.3, 14.5, 11.1, 9.9, 12.2, 8.2]
Next day Predicted Output could be of 11.5 or 11.5164374
do your think storing float 32 or float 64 would be necessary?