A 32 bit floating point number has 23 + 1 bits of mantissa and an 8 bit exponent (-126 to 127 is used though) so the largest number you can represent is:
(1 + 1 / 2 + ... 1 / (2 ^ 23)) * (2 ^ 127) =
(2 ^ 23 + 2 ^ 23 + .... 1) * (2 ^ (127 - 23)) =
(2 ^ 24 - 1) * (2 ^ 104) ~= 3.4e38
Answer from Andreas Brinck on Stack OverflowA 32 bit floating point number has 23 + 1 bits of mantissa and an 8 bit exponent (-126 to 127 is used though) so the largest number you can represent is:
(1 + 1 / 2 + ... 1 / (2 ^ 23)) * (2 ^ 127) =
(2 ^ 23 + 2 ^ 23 + .... 1) * (2 ^ (127 - 23)) =
(2 ^ 24 - 1) * (2 ^ 104) ~= 3.4e38
These numbers come from the IEEE-754 standard, which defines the standard representation of floating point numbers. Wikipedia article at the link explains how to arrive at these ranges knowing the number of bits used for the signs, mantissa, and the exponent.
A 32 bit floating point number has 23 + 1 bits of mantissa and an 8 bit exponent (-126 to 127 is used though) so the largest number you can represent is:
(1 + 1 / 2 + ... 1 / (2 ^ 23)) * (2 ^ 127) =
(2 ^ 23 + 2 ^ 23 + .... 1) * (2 ^ (127 - 23)) =
(2 ^ 24 - 1) * (2 ^ 104) ~= 3.4e38
Answer from Andreas Brinck on Stack OverflowVideos
C does not define float as described by OP. The one suggested by OP: binary32, the most popular, is one of many conforming formats.
What C does define
5.2.4.2.2 Characteristics of floating types
s sign (±1)
b base or radix of exponent representation (an integer > 1)
e exponent (an integer between a minimum emin and a maximum emax)
p precision (the number of base-b digits in the significand)
fk nonnegative integers less than b (the significand digits)
x = s*power(b,e)*Σ(k=1, p, f[k]*power(b,-k))
For binary32, the max value is
x = (+1)*power(2, 128)*(0.1111111111 1111111111 1111 binary)
x = 3.402...e+38
Given 32-bits to define a float many other possibilities occur. Example: A float could exist just like binary32, yet not support infinity/not-a-number. The leaves another exponent available numbers. The max value is then 2*3.402...e+38.
binary32 describes its significand ranging up to 1.11111... binary. The C characteristic formula above ranges up to 0.111111...
C uses single-precision floating point notation, which means that a 32-bit float has 1 bit for the sign, 8 bits for the exponent, and 23 bits for the mantissa. The mantissa is calculated by summing each mantissa bit * 2^(- (bit_index)). The exponent is calculated by converting the 8 bit binary number to a decimal and subtracting 127 (thus you can have negative exponents as well), and the sign bit indicates whether or not is negative. The formula is thus:
(-1)^S * 1.M * 2^(E - 127)
Where S is the sign, M is the mantissa, and E is the exponent. See https://en.wikipedia.org/wiki/Single-precision_floating-point_format for a better mathematical explanation.
To explicitly answer your question, that means for a 32 bit float, the largest value is (-1)^0 * 1.99999988079071044921875 * 2^128, which is 6.8056469327705771962340836696903385088 × 10^38 according to Wolfram. The smallest value is the negative of that.
Floating point numbers are stored as an exponent and a fraction within the space available.
For some systems where float is implemented as an IEEE 754 value, the results would looks as below.
sign : 1 bit
exponent : 8 bits
fraction : 23 bits
The exponent allows numbers from 2 ^ (-127) (2 to the power -127) to 2 ^ 128 ( 2 to the power 128).
Allowing a range of numbers from
5.87747E-39 3.40282E+38
the fraction point gives a fraction such as .12313
Thus with 23 bits of values, the accuracy of a number is about 7 decimal digits or 1.19 E-7
For more details see wikipedia : IEEE 754-1985
On a given system, the <cfloat> / <float.h> will give the limits. For non IEEE 754 based representations, you would have to understand how the numbers are stored to calculate the limits.
-2^(n-1) to (2^(n-1)-1) is the formula to calculate the range of data types.
Where n = no.of.bits of the primitive data type.
For example: for the byte data type, n = 8 bits
-2^(8-1) to (2^(8-1)-1)
The above calculation will give you -128 to 127. Now, coming to the question of why it’s not 255. The reason is that byte, int, short, and double are signed data types meaning it has half the range below 0 (negative) and half the range above 0 (positive). The first bit represents a sign (+ or -). The remaining bits are 7. That’s why 2^(8-1) = 128. We take 0 as a positive sign, so the range is 2^(8-1) - 1 for positive numbers. Floating point numbers are stored as an exponent and a fraction within the space available.