Find the largest power of 2 which is smaller than your number, e.g if you start with x = 10.0 then 23 = 8, so the exponent is 3. The exponent is biased by 127 so this means the exponent will be represented as 127 + 3 = 130. The mantissa is then 10.0/8 = 1.25. The 1 is implicit so we just need to represent 0.25, which is 010 0000 0000 0000 0000 0000 when expressed as a 23 bit unsigned fractional quantity. The sign bit is 0 for positive. So we have:
s | exp [130] | mantissa [(1).25] |
0 | 100 0001 0 | 010 0000 0000 0000 0000 0000 |
0x41200000
You can test the representation with a simple C program, e.g.
#include <stdio.h>
typedef union
{
int i;
float f;
} U;
int main(void)
{
U u;
u.f = 10.0;
printf("%g = %#x\n", u.f, u.i);
return 0;
}
Answer from Paul R on Stack OverflowVideos
Find the largest power of 2 which is smaller than your number, e.g if you start with x = 10.0 then 23 = 8, so the exponent is 3. The exponent is biased by 127 so this means the exponent will be represented as 127 + 3 = 130. The mantissa is then 10.0/8 = 1.25. The 1 is implicit so we just need to represent 0.25, which is 010 0000 0000 0000 0000 0000 when expressed as a 23 bit unsigned fractional quantity. The sign bit is 0 for positive. So we have:
s | exp [130] | mantissa [(1).25] |
0 | 100 0001 0 | 010 0000 0000 0000 0000 0000 |
0x41200000
You can test the representation with a simple C program, e.g.
#include <stdio.h>
typedef union
{
int i;
float f;
} U;
int main(void)
{
U u;
u.f = 10.0;
printf("%g = %#x\n", u.f, u.i);
return 0;
}
Take a number 172.625.This number is Base10 format.
Convert this format is in base2 format For this, first convert 172 in to binary format
128 64 32 16 8 4 2 1
1 0 1 0 1 1 0 0
172=10101100
Convert 0.625 in to binary format
0.625*2=1.250 1
0.250*2=.50 0
0.50*2=1.0 1
0.625=101
Binary format of 172.625=10101100.101. This is in base2 format 10101100*2
Shifting this binary number
1.0101100*2 **7 Normalized
1.0101100 is mantissa
2 **7 is exponent
add exponent 127 7+127=134
convert 134 in to binary format
134=10000110
The number is positive so sign of the number 0
0 |10000110 |01011001010000000000000
Explanation: The high order of bit is the sign of the number. number is stored in a sign magnitude format. The exponent is stored in 8 bit field format biased by 127 to the exponent The digit to the right of the binary point stored in the low order of 23 bit. NOTE---This format is IEEE 32 bit floating point format
10/32 = 5/16 = 5β’2β4 = 1.25β’2β2 = 1.012β’2β2.
The sign is +, the exponent is β2, and the significand is 1.012.
A positive sign is encoded as 0.
Exponent β2 is encoded as β2 + 127 = 125 = 011111012.
Significand 1.012 is 1.010000000000000000000002, and it is encoded using the last 23 bits, 010000000000000000000002.
Putting these together, the IEEE-754 encoding is 0 01111101 01000000000000000000000. To convert to hexadecimal, first organize into groups of four bits: 0011 1110 1010 0000 0000 0000 0000 0000. Then the hexadecimal can be easily read: 3EA0000016.
I see it like this:
10/32 = // input
10/2^5 = // convert division by power of 2 to bitshift
1010b >> 5 =
.01010b // fractional result
--^-------------------------------------------------------------
|
first nonzero bit is the exponent position and start of mantissa
----------------------------------------------------------------
man = (1)010b // first one is implicit
exp = -2 + 127 = 125 // position from decimal point + bias
sign = 0 // non negative
----------------------------------------------------------------
0 01111101 01000000000000000000000 b
^ ^ ^
| | mantissa + zero padding
| exp
sign
----------------------------------------------------------------
0011 1110 1010 0000 0000 0000 0000 0000 b
3 E A 0 0 0 0 0 h
----------------------------------------------------------------
3EA00000h
Yes the answer of Eric Postpischil is the same approach (+1 btw) but I didn't like the formating as it was not clear from a first look what to do without proper reading the text.
I have figured out the equation to get out the exact number of the binary representation, it is: sign * 2^b-exp * mantissa
Edit: To get the right mantissa, you need to ONLY calculate it starting at the fractional part of the binary. So for example, if your fractional is 011 1111...
Then you would do (1*2^-0) + (1*2^-1) + (1*2^-2)...
Keep doing this for all the numbers and you'll get your mantissa.
Instead of calculating all those bits behind the comma, which is heck of a job, IMO, just scale everything by 2^23 and subtract 23 more from the exponent for compensation.
This is explained in my article about floating point for Delphi.
First decode:
0 - 1000 1101 - 011 1111 1100 0000 0000 0000
Insert hidden bit:
0 - 1000 1101 - 1011 1111 1100 0000 0000 0000
In hex:
0 - 8D - BFC000
0x8D = 141, minus bias of 127, that becomes 14.
I like to scale things, so the calculation is:
sign * full_mantissa * (exp - bias - len)
where full_mantissa is the mantissa, including hidden bit, as integer; bias = 127 and len = 23 (the number of mantissa bits).
So then it becomes:
1 * 0xBFC000 * 2^(14-23) = 0xBFC000 / 0x200 = 0x5FE0 = 24544
because 2^(14-23) = 2^-9 = 1 / 2^9 = 1 / 0x200.