a = np.array([0.123456789121212,2,3], dtype=np.float16)
print("16bit: ", a[0])

a = np.array([0.123456789121212,2,3], dtype=np.float32)
print("32bit: ", a[0])

b = np.array([0.123456789121212121212,2,3], dtype=np.float64)
print("64bit: ", b[0])
  • 16bit: 0.1235
  • 32bit: 0.12345679
  • 64bit: 0.12345678912121212
Answer from Furkan Gulsen on Stack Overflow
🌐
Python⇒Speed
pythonspeed.com › articles › float64-float32-precision
The problem with float32: you only get 16 million values
February 1, 2023 - Libraries like NumPy and Pandas ... (“double-precision” or 64-bit floats) to numpy.float32 (“single-precision” or 32-bit floats) cuts memory usage in half....
🌐
Quora
quora.com › What-are-the-differences-between-float32-and-float64
What are the differences between float32 and float64? - Quora
Rounding error and accumulation: float32 accumulates larger relative errors in sums, dot products, integrators, and iterative algorithms. float64 reduces rounding noise and is often necessary for numerically sensitive computations (e.g., matrix ...
🌐
NumPy
numpy.org › doc › stable › user › basics.types.html
Data types — NumPy v2.4 Manual
Python’s floating-point numbers are usually 64-bit floating-point numbers, nearly equivalent to numpy.float64. In some unusual situations it may be useful to use floating-point numbers with more precision.
🌐
GitHub
github.com › Unidata › netcdf4-python › issues › 926
variable attributes: float32 vs float64, int32 vs int64 · Issue #926 · Unidata/netcdf4-python
May 7, 2019 - When I define my variables as data_type of "f" or "f4", these should be 32-bit floating-point decimals. However, when defining a variable attribute whose value is a floating-point via setncattr, the result is a 64-bit floating-point ("do...
Author   ghost
🌐
Medium
medium.com › @amit25173 › understanding-numpy-float64-a300ac9e096a
Understanding numpy.float64. If you think you need to spend $2,000… | by Amit Yadav | Medium
February 8, 2025 - This means it uses 64 bits to store a floating-point number, allowing for more precision when performing calculations compared to smaller types like float32. Think of it like having more space to store a more accurate decimal value.
🌐
GitHub
github.com › Theano › Theano › issues › 1913
float64 is faster than float32?? · Issue #1913 · Theano/Theano
June 12, 2014 - (float32) Time elapsed : 2.470041 second (float64) Time elapsed : 1.159385 second (float32 optimized) Time elapsed : 1.616751 second (float32 highly-optimized) Time elapsed : 0.410361 second
Author   junku901
🌐
Massed Compute
massedcompute.com › home › faq answers
What are the key differences between float32 and float64 data types in machine learning? - Massed Compute
July 31, 2025 - DISCLAIMER: This is for large language model education purpose only. All content displayed below is AI generate content. Some content may not be accurate. Please review our Terms & Conditions and our Privacy Policy for subscription policies · Please leave this field empty
🌐
Quora
quora.com › What-is-np-float32-and-np-float64-in-numpy-in-simple-terms
What is np.float32 and np.float64 in numpy in simple terms? - Quora
Float32 has larger rounding error and lower dynamic range; more prone to underflow/overflow and loss of significance in subtraction of similar numbers. Float64 accumulates much smaller relative error and is preferred for scientific computing ...
Find elsewhere
🌐
Google Groups
groups.google.com › g › julia-users › c › Lq2gQf4Ktp4
64 bit system: Force Float default to Float32
-E ... Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message ... Note also that while Float64 * Float32 (and other arithmetic ops) produce Float64, for other "less precise" numeric types, Float32 wins out, making ...
Top answer
1 of 2
2

It appears you're familiar with the problems of binary floating point accuracy, but anybody who isn't should read Is floating point math broken?

Converting from a type with high accuracy to one with lower accuracy involves rounding. The rules for rounding binary floating point are well established, and always try to deliver the closest representable value to the one you started with. It's quite easy for two 64-bit values to round to the same 32-bit value, because the intervals between consecutive values are so much wider.

For your examples you can see that they're ever so slightly different, one is too low and the other is too high. But look at how close they are!

>>> x = np.float64(0.0003)
>>> y = np.float64(0.0001 * 3)
>>> f'{x:.65f}'
'0.00029999999999999997371893933895137251965934410691261291503906250'
>>> f'{y:.65f}'
'0.00030000000000000002792904796322659422003198415040969848632812500'

When you round them to float32 they become identical because of the necessary rounding applied.

>>> x = x.astype(np.float32)
>>> f'{x:.65f}'
'0.00030000001424923539161682128906250000000000000000000000000000000'
>>> y = y.astype(np.float32)
>>> f'{y:.65f}'
'0.00030000001424923539161682128906250000000000000000000000000000000'

When you see the closest alternatives they could have chosen, it's easy to see why they rounded the way they did.

>>> one = np.float32(1.0)
>>> f'{np.nextafter(x, one):.65f}'
'0.00030000004335306584835052490234375000000000000000000000000000000'
>>> f'{np.nextafter(x, -one):.65f}'
'0.00029999998514540493488311767578125000000000000000000000000000000'
2 of 2
2

Imagine your two friends ran a marathon in actual times 5:04:16 and 4:58:47. You ask them about their times.

Float64-minded friends might tell you "5 hours 4 minutes" and "4 hours 59 minutes". Not exact, but pretty exact. You conclude they're not equally fast.

Float32-minded friends might tell you "5 hours" and "5 hours". Even less exact. You conclude they are equally fast.

🌐
GitHub
github.com › erikbern › ann-benchmarks › issues › 20
float32 vs float64 · Issue #20 · erikbern/ann-benchmarks
June 8, 2016 - Below are my justifications for using float32 in the context of approximate K-NN search: The search process is usually approximate, and the feature data are usually noisy by nature, so typically the precision of float64 is not necessary.
Author   aaalgo
🌐
DNMTechs
dnmtechs.com › understanding-the-difference-between-float32-and-float64-in-python-3
Understanding the difference between float32 and float64 in Python 3 – DNMTechs – Sharing and Storing Technology Knowledge
Understanding the difference between float32 and float64 in Python is important for handling numerical data with precision. Float32 uses 32 bits to represent a floating-point number, while float64 uses 64 bits. Float64 provides higher precision but requires more memory compared to float32.
Top answer
1 of 3
4

The printed values are not correct. In your case y is smaller than 1 when using float64 and bigger or equal to 1 when using float32. this is expected since rounding errors depend on the size of the float.

To avoid this kind of problems, when dealing with floating point numbers you should always decide a "minimum error", usually called epsilon and, instead of comparing for equality, checking whether the result is at most distant epsilon from the target value:

In [13]: epsilon = 1e-11

In [14]: number = np.float64(1) - 1e-16

In [15]: target = 1

In [16]: abs(number - target) < epsilon   # instead of number == target
Out[16]: True

In particular, numpy already provides np.allclose which can be useful to compare arrays for equality given a certain tolerance. It works even when the arguments aren't arrays(e.g. np.allclose(1 - 1e-16, 1) -> True).

Note however than numpy.set_printoptions doesn't affect how np.float32/64 are printed. It affects only how arrays are printed:

In [1]: import numpy as np

In [2]: np.float(1) - 1e-16
Out[2]: 0.9999999999999999

In [3]: np.array([1 - 1e-16])
Out[3]: array([ 1.])

In [4]: np.set_printoptions(precision=16)

In [5]: np.array([1 - 1e-16])
Out[5]: array([ 0.9999999999999999])

In [6]: np.float(1) - 1e-16
Out[6]: 0.9999999999999999

Also note that doing print y or evaluating y in the interactive interpreter gives different results:

In [1]: import numpy as np

In [2]: np.float(1) - 1e-16
Out[2]: 0.9999999999999999

In [3]: print(np.float64(1) - 1e-16)
1.0

The difference is that print calls str while evaluating calls repr:

In [9]: str(np.float64(1) - 1e-16)
Out[9]: '1.0'

In [10]: repr(np.float64(1) - 1e-16)
Out[10]: '0.99999999999999989'
2 of 3
1
In [26]: x  = numpy.float64("1.000000000000001")

In [27]: print x, repr(x)
1.0 1.0000000000000011

In other words, you are plagued by loss of precision in print statement. The value is very slightly different than 1.

Top answer
1 of 2
69
>>> numpy.float64(5.9975).hex()
'0x1.7fd70a3d70a3dp+2'
>>> (5.9975).hex()
'0x1.7fd70a3d70a3dp+2'

They are the same number. What differs is the textual representation obtained via by their __repr__ method; the native Python type outputs the minimal digits needed to uniquely distinguish values, while NumPy code before version 1.14.0, released in 2018 didn't try to minimise the number of digits output.

2 of 2
3

Numpy float64 dtype inherits from Python float, which implements C double internally. You can verify that as follows:

isinstance(np.float64(5.9975), float)   # True

So even if their string representation is different, the values they store are the same.

On the other hand, np.float32 implements C float (which has no analog in pure Python) and no numpy int dtype (np.int32, np.int64 etc.) inherits from Python int because in Python 3 int is unbounded:

isinstance(np.float32(5.9975), float)   # False
isinstance(np.int32(1), int)            # False

So why define np.float64 at all?

np.float64 defines most of the attributes and methods in np.ndarray. From the following code, you can see that np.float64 implements all but 4 methods of np.array:

[m for m in set(dir(np.array([]))) - set(dir(np.float64())) if not m.startswith("_")]

# ['argpartition', 'ctypes', 'partition', 'dot']

So if you have a function that expects to use ndarray methods, you can pass np.float64 to it while float doesn't give you the same.

For example:

def my_cool_function(x):
    return x.sum()

my_cool_function(np.array([1.5, 2]))   # <--- OK
my_cool_function(np.float64(5.9975))   # <--- OK
my_cool_function(5.9975)               # <--- AttributeError
Top answer
1 of 1
8

TL;DR: there is several explanations for this regarding the target math library you actually use on your machine. The major reason in your case is likely that they are not implemented the same way : Numpy as its own implementation for simple-precision numbers while it calls the standard math library implementation for double-precision numbers. The former uses SIMD instructions while the later does not. It also turns out Numpy is often not packaged with performance in mind but compatibility (so the same generic packages may be provided for all x86-64 CPUs).


First of all, some math libraries use SIMD instructions (eg. SSE, AVX, Neon) to compute several floating-point items at the same time. The Intel SVML is one of such library. AFAIK, It can be linked to speed up Numpy operations like this. Such instructions operate on fixed-length SIMD registers (eg. 128 bits for SSE/Neon, 256 bits for AVX). The thing is double-precision numbers are twice bigger so twice less items fit in a register. This means the amount of work for double-precision numbers is twice bigger resulting in a twice slower execution.

The above point only applies if the target math library uses SIMD instructions though. Scalar instructions are not affected by this problem. However, the latency of double-precision numbers can sometimes big significantly higher than simple-precision numbers (because double-precision requires far more transistors since there is more bits to computes and a longer dependency chain between transistors themselves). On recent mainstream x86-64 architecture both types are equally fast for scalar ADD/MUL/FMA operations (but not advanced operations like SQRT). You can check that here.

The default math library on Linux is the glibc. It uses a scalar implementation by default0 which is based on a lookup table. Since the lookup table is too small1 to contain the result of every floating-point value, the result is adjusted using a nth-order polynomial. Double-precision numbers are so precise that the table need to be bigger2 and the polynomial need to have a significant higher order with double precision. While a bigger lookup-table is generally slower because of possible cache-misses, I do not expect this to be a significant issue because of the large number of items to be computed. The higher-order polynomial is likely the be responsible for the slowdown. Indeed, higher order polynomial are computed using a sequence of fuse-multiply-add (FMA) operations resulting in a dependency chain. This dependency chain matters since the latency of an FMA is already generally pretty "big" (~4 cycles on mainstream x86-64 processors). That being said, I do not expect this to cause a slowdown of 5x (as we can see on your machine). Maybe just 2x-3x but not more. This means this is likely not the main issue.

At this point, I was surprised and curious about what it could be, so I profiled what is going on my Linux machine (having a i5-9600KF CPU). Here is the results:

  • the double-precision computation is about 4 times slower than the simple-precision one so I can mostly reproduce your issue;
  • during the double-precision computation, the two main expensive functions are __ieee754_exp_fma and exp@@GLIBC_2.29. This shows that Numpy uses the glibc on my machine (as expected);
  • during the simple-precision computation, 99% of the time is spent in Numpy so the glibc is not even used to compute the exponential in this case!

It turns out Numpy apparently has its own implementation of exp for simple-precision numbers! This means there is no need to even call an external function from another library in this case (expensive). And there's more : it turns out this simple-precision implementation of Numpy actually uses SIMD instructions while the one of the glibc does not. I found out that by analyzing performance counters of my processors. It looks like the main function responsible to compute that in Numpy is called npy_exp (the file npy_math_internal.h.src calls it).

SIMD implementations can be much faster in simple-precision. However, they are not so fast for double-precision because of the two aforementioned factors : less items per SIMD register and more FMA instructions required to reach the requested higher precision. Double-precision SIMD implementation only worth it when the processor provide wide SIMD registers and low-latency FMA instructions3. This was not initially the case on most processors a decade ago4 : 128-bit SSE was the main standard SIMD instruction set on x86-64 CPUs (the 256-bit AVX instruction set was pretty new), there was no widely-accepted FMA instruction set (so the latency is the one of a MUL+ADD) and the latency of floating-point operations was slightly higher. This is probably why Numpy did not implement it, not to mention it takes some time to implement and much more to maintain it. Nowadays, the 512-bit AVX-512 instruction set supported by very recent CPUs is wide enough5 for double-precision implementations to use SIMD instructions. In fact, Intel developers added such implementation directly in Numpy (see here)! This means that if you run on a CPU supporting AVX-512 (eg. AMD Zen4 or Intel IceLake), then the effect should be significantly less visible (still about twice slower). If you want a faster double-precision computation, I advise you try an SIMD math library like the SVML.

Update: I run the code on an IceLake server (with AVX-512) and results are significantly closer as expected if Numpy is built correctly. That being said, I discovered that neither standard Ubuntu packages nor PIP seems to enable AVX-512 in Numpy. In fact, the resulting packages was very inefficient so I rebuilt Numpy from scratch to do the job correctly. The double-precision version is only 1.76 times slower. Here are results:

Intel CoffeeLake (i5-9600KF) -- from standard debian packages:
    64bit time: 0.39326330599578796
    32bit time: 0.08715593699889723    (x4.51 faster)

Intel IceLake (Xeon 8375C) -- from standard Ubuntu packages:
    64bit time: 1.4964690230001452
    32bit time: 0.5068110490001345     (x2.95 faster)

Intel IceLake (Xeon 8375C) -- from PIP packages:
    64bit time: 0.9384758739997778
    32bit time: 0.550410964999628      (x1.85 faster)

Intel IceLake (Xeon 8375C) -- manual Numpy install enabling AVX-512:
    64bit time: 0.09678016599991679
    32bit time: 0.054961627000011504   (x1.76 faster)

Note that IceLake results are faster than the one of CoffeeLake (expected) despite a lower frequency of the IceLake processor (~3.5 GHz turbo frequency for the IceLake Xeon VS ~4.5 GHz for the CoffeeLake one). I advise you to rebuild Numpy yourself to be sure the target package efficiently use your machine.


Footnotes:

0: the glibc has an SIMD implementations for simple-precision numbers, but it looks like it is only called by GCC if -ffast-math is provided so it might not be IEEE-754 compliant.
1: the lookup table cannot be too big because it would require too much memory space and it would also cause expensive cache misses.
2: the lookup-table is actually 4 times bigger for double-precision numbers in the last version of the glibc (2**5=32 VS 2**7=128 items).
3: the latency of the SIMD instructions can be mitigated by computing more items concurrently but this require more SIMD register and the amount of available is limited (especially on old processors), not to mention many people was not aware of this latency issue.
4: that is >15 years ago since people keep their machine several years and Numpy targets good performance on average machines.
5: AVX-512 provide also much more registers than SSE (x4) and AVX (x2) so the latency can now be actually more easily mitigated.

Top answer
1 of 2
63

Python's standard float type is a C double: http://docs.python.org/2/library/stdtypes.html#typesnumeric

NumPy's standard numpy.float is the same, and is also the same as numpy.float64.

2 of 2
3

Data type-wise numpy floats and built-in Python floats are the same, however boolean operations on numpy floats return np.bool_ objects, which always return False for val is True. Example below:

In [1]: import numpy as np
   ...: an_np_float = np.float32(0.3)
   ...: a_normal_float = 0.3
   ...: print(a_normal_float, an_np_float)
   ...: print(type(a_normal_float), type(an_np_float))

0.3 0.3
<class 'float'> <class 'numpy.float32'>

Numpy floats can arise from scalar output of array operations. If you weren't checking the data type, it is easy to confuse numpy floats for native floats.

In [2]: criterion_fn = lambda x: x <= 0.5
   ...: criterion_fn(a_normal_float), criterion_fn(an_np_float)

Out[2]: (True, True)

Even boolean operations look correct. However the result of the numpy float isn't a native boolean datatype, and thus can't be truthy.


In [3]: criterion_fn(a_normal_float) is True, criterion_fn(an_np_float) is True
Out[3]: (True, False)

In [4]: type(criterion_fn(a_normal_float)), type(criterion_fn(an_np_float))
Out[4]: (bool, numpy.bool_)

According to this github thread, criterion_fn(an_np_float) == True will evaluate properly, but that goes against the PEP8 style guide.

Instead, extract the native float from the result of numpy operations. You can do an_np_float.item() to do it explicitly (ref: this SO post) or simply pass values through float().

🌐
Julia Programming Language
discourse.julialang.org › general usage › performance
Moving from `Float64` to `Float32` not improving performance - Performance - Julia Programming Language
October 10, 2022 - Hi, I’m trying to optimize a script. I first did the usual stuff to avoid memory allocation, but when moving all data structures from Float64 to Float32, which should result in a reduced memory usage, the code has become super slow. I’m still trying to get my head around this, but I was ...
🌐
Edureka Community
edureka.co › home › community › categories › python › difference between python float and numpy float32
Difference between Python float and numpy float32 | Edureka Community
March 4, 2019 - What is the difference between the built in float and numpy.float32? For example, here is a code: ... > 58682.8 What is the built in float format?