Videos
How do I convert a number to floating-point?
To convert a decimal number to a floating-point representation, follow these steps:
- Convert the entire number to binary and then normalize it, i.e. write it in scientific notation using base 2.
- Truncate the fraction after the decimal point to the allowable fraction length.
- Extract the sign, exponent, and fraction segments.
Refer to the IEEE754 standard for more detailed instructions.
Why do we use floating-point numbers?
The IEEE754 floating-point format standard enables efficient storage and processing of numbers in computers.
-
From a hardware perspective, many simplifications of floating-point operations can be made to significantly speed up arithmetic, thanks to the IEEE754 standard's specifications.
-
For software, floats are very precise and typically lose a few millionths (if not less) per operation, which enables high-precision scientific and engineering applications.
What is an IEEE754 floating-point number?
In computing, a floating-point number is a data format used to store fractional numbers in a digital machine. A floating-point number is represented by a series of bits (1s and 0s). Computers perform mathematical operations on these bits directly instead of how a human would do the math. When a human wants to read the floating-point number, a complex formula reconstructs the bits into the decimal system.