To the first question: there's no hardware support for float16 on a typical processor (at least outside the GPU). NumPy does exactly what you suggest: convert the float16 operands to float32, perform the scalar operation on the float32 values, then round the float32 result back to float16. It can be proved that the results are still correctly-rounded: the precision of float32 is large enough (relative to that of float16) that double rounding isn't an issue here, at least for the four basic arithmetic operations and square root.

In the current NumPy source, this is what the definition of the four basic arithmetic operations looks like for float16 scalar operations.

#define half_ctype_add(a, b, outp) *(outp) = \
        npy_float_to_half(npy_half_to_float(a) + npy_half_to_float(b))
#define half_ctype_subtract(a, b, outp) *(outp) = \
        npy_float_to_half(npy_half_to_float(a) - npy_half_to_float(b))
#define half_ctype_multiply(a, b, outp) *(outp) = \
        npy_float_to_half(npy_half_to_float(a) * npy_half_to_float(b))
#define half_ctype_divide(a, b, outp) *(outp) = \
        npy_float_to_half(npy_half_to_float(a) / npy_half_to_float(b))

The code above is taken from scalarmath.c.src in the NumPy source. You can also take a look at loops.c.src for the corresponding code for array ufuncs. The supporting npy_half_to_float and npy_float_to_half functions are defined in halffloat.c, along with various other support functions for the float16 type.

For the second question: no, there's no float8 type in NumPy. float16 is a standardized type (described in the IEEE 754 standard), that's already in wide use in some contexts (notably GPUs). There's no IEEE 754 float8 type, and there doesn't appear to be an obvious candidate for a "standard" float8 type. I'd also guess that there just hasn't been that much demand for float8 support in NumPy.

Answer from Mark Dickinson on Stack Overflow
🌐
GitHub
github.com › numpy › numpy › issues › 26405
ENH: <support datatype FP8> · Issue #26405 · numpy/numpy
May 9, 2024 - Proposed new feature or change: Hi : Do you have any plan to support FP8 which proposed in https://arxiv.org/pdf/2209.05433 ?
Author   xiguadong
🌐
GitHub
github.com › jax-ml › ml_dtypes
GitHub - jax-ml/ml_dtypes: A stand-alone implementation of several NumPy dtype extensions used in machine learning. · GitHub
ml_dtypes is a stand-alone ... · 8-bit floating point representations, parameterized by number of exponent and mantissa bits, as well as the bias (if any) and representability of infinity, NaN, and signed zero. float8_e3m4 ...
Starred by 334 users
Forked by 56 users
Languages   C++ 73.8% | Python 25.8% | CMake 0.4%
Discussions

numpy.float() returns non-numpy python builtin float
This is different to behavior of dtype options in array initializa... More on github.com
🌐 github.com
5
November 1, 2013
BUG: Shouldn't numpy's floating types (e.g. `np.float_`) and python's native `float` type be compatible?
Describe the issue: When trying to type hint functions accepting scalars, I get unexpected error messages from my type checker (Pylance). As np.float_ is a subtype of python's native float type, I ... More on github.com
🌐 github.com
9
April 25, 2023
Define a custom float8 in python-numpy and convert from/to float16? - Stack Overflow
I am trying to define a custom 8 bit floating point format as follows: 1 sign bit 2 bits for mantissa 5 bits for exponent Is it possible to define this as a numpy datatype? If not, what is the e... More on stackoverflow.com
🌐 stackoverflow.com
BUG: Weird conversion behavior from np.float32 to Python float
Describe the issue: I found out that converting np.float32 to a Python float via .item() gives a weird result. While I understand NumPy retains the float32 internal representation of the value, I f... More on github.com
🌐 github.com
3
February 16, 2024
Top answer
1 of 2
40

To the first question: there's no hardware support for float16 on a typical processor (at least outside the GPU). NumPy does exactly what you suggest: convert the float16 operands to float32, perform the scalar operation on the float32 values, then round the float32 result back to float16. It can be proved that the results are still correctly-rounded: the precision of float32 is large enough (relative to that of float16) that double rounding isn't an issue here, at least for the four basic arithmetic operations and square root.

In the current NumPy source, this is what the definition of the four basic arithmetic operations looks like for float16 scalar operations.

#define half_ctype_add(a, b, outp) *(outp) = \
        npy_float_to_half(npy_half_to_float(a) + npy_half_to_float(b))
#define half_ctype_subtract(a, b, outp) *(outp) = \
        npy_float_to_half(npy_half_to_float(a) - npy_half_to_float(b))
#define half_ctype_multiply(a, b, outp) *(outp) = \
        npy_float_to_half(npy_half_to_float(a) * npy_half_to_float(b))
#define half_ctype_divide(a, b, outp) *(outp) = \
        npy_float_to_half(npy_half_to_float(a) / npy_half_to_float(b))

The code above is taken from scalarmath.c.src in the NumPy source. You can also take a look at loops.c.src for the corresponding code for array ufuncs. The supporting npy_half_to_float and npy_float_to_half functions are defined in halffloat.c, along with various other support functions for the float16 type.

For the second question: no, there's no float8 type in NumPy. float16 is a standardized type (described in the IEEE 754 standard), that's already in wide use in some contexts (notably GPUs). There's no IEEE 754 float8 type, and there doesn't appear to be an obvious candidate for a "standard" float8 type. I'd also guess that there just hasn't been that much demand for float8 support in NumPy.

2 of 2
32

This answer builds on the float8 aspect of the question. The accepted answer covers the rest pretty well. One of the main reasons there isn't a widely accepted float8 type, other than a lack of standard is that its not very useful practically.

Primer on Floating Point

In standard notation, a float[n] data type is stored using n bits in memory. That means that at most only 2^n unique values can be represented. In IEEE 754, a handful of these possible values, like nan, aren't even numbers as such. That means all floating point representations (even if you go float256) have gaps in the set of rational numbers that they are able to represent and they round to the nearest value if you try to get a representation for a number in this gap. Generally the higher the n, the smaller these gaps are.

You can see the gap in action if you use the struct package to get the binary representation of some float32 numbers. It's a bit startling to run into at first but there's a gap of 32 just in the integer space:

import struct

billion_as_float32 = struct.pack('f', 1000000000)
for i in range(32):
    billion_as_float32 == struct.pack('f', 1000000001 + i) // True

Generally, floating point is best at tracking only the most significant bits so that if your numbers have the same scale, the important differences are preserved. Floating point standards generally differ only in the way they distribute the available bits between a base and an exponent. For instance, IEEE 754 float32 uses 24 bits for the base and 8 bits for the exponent.

Back to float8

By the above logic, a float8 value can only ever take on 256 distinct values, no matter how clever you are in splitting the bits between base and exponent. Unless you're keen on it rounding numbers to one of 256 arbitrary numbers clustered near zero it's probably more efficient to just track the 256 possibilities in a int8.

For instance, if you wanted to track a very small range with coarse precision you could divide the range you wanted into 256 points and then store which of the 256 points your number was closest to. If you wanted to get really fancy you could have a non-linear distribution of values either clustered at the centre or at the edges depending on what mattered most to you.

The likelihood of anyone else (or even yourself later on) needing this exact scheme is extremely small and most of the time the extra byte or 3 you pay as a penalty for using float16 or float32 instead is too small to make a meaningful difference. Hence...almost no one bothers to write up a float8 implementation.

🌐
GitHub
github.com › rdkit › rdkit › issues › 5895
numpy.float is no longer supported and causes exceptions · Issue #5895 · rdkit/rdkit
December 23, 2022 - >>> from rdkit import Chem >>> from rdkit.Chem import GraphDescriptors >>> mol = Chem.MolFromSmiles("c1ccccc1") >>> GraphDescriptors.Ipc(mol) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/pwalters/anaconda3/envs/rdkit_2022_09/lib/python3.8/site-packages/rdkit/Chem/GraphDescriptors.py", line 124, in Ipc cPoly = abs(Graphs.CharacteristicPolynomial(mol, adjMat)) File "/home/pwalters/anaconda3/envs/rdkit_2022_09/lib/python3.8/site-packages/rdkit/Chem/Graphs.py", line 43, in CharacteristicPolynomial res = numpy.zeros(nAtoms + 1, numpy.float) File "/home/pwalters/anaconda3/envs/rdkit_2022_09/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__ raise AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'float'
🌐
GitHub
github.com › numpy › numpy › issues › 3998
numpy.float() returns non-numpy python builtin float · Issue #3998 · numpy/numpy
November 1, 2013 - >>> print type(np.arange(2, dtype=np.float)[0]) <type 'numpy.float64'> >>> print type(np.arange(2, dtype=np.int)[0]) <type 'numpy.int64'>
Author   megies
🌐
Crumb
crumb.sh › 3ERVuUd3iaD
NumPy float types: a demonstration of precision - Python 3 code example - crumb.sh
The different NumPy float types allow us to store floats in different precision, dependent on the number of bits we allow the float to use. The larger the number of allowed bits, the more precision our array’s elements will have.
🌐
GitHub
github.com › numpy › numpy › issues › 23663
BUG: Shouldn't numpy's floating types (e.g. `np.float_`) and python's native `float` type be compatible? · Issue #23663 · numpy/numpy
April 25, 2023 - import numpy as np def idx(x: float) -> float: return x arr = np.arange(5.0, dtype=np.float_) sum = np.sum(arr) idx(sum)
Author   simonaltrogge
Find elsewhere
Top answer
1 of 1
5

I'm by no means an expert in numpy, but I like to think about FP representation problems. The size of your array is not huge, so any reasonably efficient method should be fine. It doesn't look like there's an 8 bit FP representation, I guess because the precision isn't so good.

To convert to an array of bytes, each containing a single 8 bit FP value, for a single dimensional array, all you need is

float16 = np.array([6.3, 2.557])           # Here's some data in an array
float8s = array.tobytes()[1::2]
print(float8s)
>>> b'FAAF'

This just takes the high-order bytes from the 16 bit float by lopping off the low order part, giving a 1 bit sign, 5 bit exponent and 2 bit significand. The high order byte is always the second byte of each pair on a little-endian machine. I've tried it on a 2D array and it works the same. This truncates. Rounding in decimal would be a whole other can of worms.

Getting back to 16 bits would be just inserting zeros. I found this method by experiment and there is undoubtedly a better way, but this reads the byte array as 8 bit integers and writes a new one as 16 bit integers and then converts it back to an array of floats. Note the big-endian representation converting back to bytes as we want the 8 bit values to be the high order bytes of the integers.

float16 = np.frombuffer(np.array(np.frombuffer(float8s, dtype='u1'), dtype='>u2').tobytes(), dtype='f2')
print(float16)
>>> array([6. , 2.5, 2.5, 6. ], dtype=float16)

You can definitely see the loss of precision! I hope this helps. If this is sufficient, let me know. If not, I'd be up for looking deeper into it.

🌐
GitHub
github.com › numpy › numpy › issues › 25836
BUG: Weird conversion behavior from np.float32 to Python float · Issue #25836 · numpy/numpy
February 16, 2024 - Describe the issue: I found out that converting np.float32 to a Python float via .item() gives a weird result. While I understand NumPy retains the float32 internal representation of the value, I feel that for such a simple value the fol...
Author   fandreuz
🌐
Google Groups
groups.google.com › g › cython-users › c › yubkr1nySA8
Question about numpy.float16 support in cython
July 24, 2021 - For Numpy you could probably just add `npy_half_to_float` and friends to their .pxd file (and submit that as a PR). That would give you access to those functions, although getting them in/out of the array would remain difficult. For Cython: you could create a github issue (although that doesn't ...
🌐
GitHub
github.com › numpy › numpy › issues › 14150
How to convert np.float32 to Python float easily? · Issue #14150 · numpy/numpy
July 29, 2019 - import numpy as np x = np.float32(1.9) x.tolist() # 1.899999976158142 x.item() # 1.899999976158142 x.view() # 1.9 str(x) # '1.9' float(str(x)) # 1.9
Author   ringsaturn
🌐
GitHub
github.com › numpy
NumPy · GitHub
A cross-platform Quad (128-bit) float Data-Type for NumPy.
🌐
Python Data Science Handbook
jakevdp.github.io › PythonDataScienceHandbook › 02.01-understanding-data-types.html
Understanding Data Types in Python | Python Data Science Handbook
Effective data-driven science and computation requires understanding how data is stored and manipulated. This section outlines and contrasts how arrays of data are handled in the Python language itself, and how NumPy improves on this.
🌐
NumPy
numpy.org › doc › 1.22 › user › basics.types.html
Data types — NumPy v1.22 Manual
NumPy supports a much greater variety of numerical types than Python does. This section shows which are available, and how to modify an array’s data-type · The primitive types supported are tied closely to those in C:
🌐
GitHub
github.com › numpy › numpy › issues › 8063
float16 precision · Issue #8063 · numpy/numpy
September 19, 2016 - from __future__ import print_function, division, absolute_import import numpy as np np.random.seed(1208) x = np.random.rand(1) print(x, x.dtype) x = x.astype('float32') print(x, x.dtype) print('Convert:') x1 = np.cast['float16'](x) print(x1, x1.dtype) x1 = x1.astype('float32') print(x1, x1.dtype) print('Round and Convert:') print(np.around(x, 6), np.around(x, 6).dtype) x2 = np.cast['float16'](np.around(x, 6)) print(x2, x2.dtype) x2 = x2.astype('float32') print(x2, x2.dtype)
Author   trungnt13
🌐
GitHub
github.com › pyamg › pyamg › issues › 273
float128 not available for numpy under windows · Issue #273 · pyamg/pyamg
April 14, 2021 - D:\Anaconda\envs\loop3\lib\site-packages\numpy_init_.py in getattr(attr) 303 try: --> 304 use_hugepage = 1 305 kernel_version = os.uname().release.split(".")[:2]
Author   markjessell
🌐
NumPy
numpy.org › doc › 2.1 › reference › generated › numpy.finfo.html
numpy.finfo — NumPy v2.1 Manual
Machine limits for floating point types · Kind of floating point or complex floating point data-type about which to get information
🌐
NumPy
numpy.org › doc › 2.1 › reference › arrays.scalars.html
Scalars — NumPy v2.1 Manual
Python defines only one type of a particular data class (there is only one integer type, one floating-point type, etc.). This can be convenient in applications that don’t need to be concerned with all the ways data can be represented in a computer. For scientific computing, however, more ...