How is a floating-point number stored?

A 32-bit/4-byte memory will be allocated by the computer to store a float value.

Consider float store_float = 33.154;
(Conversion steps as per IEEE 754 floating point)

1. Determine the sign bit: The value is positive, hence sign bit is 0.

2. Convert integer as binary: 33 to binary –> 100001

3. Convert fraction to binary: 154 to binary –> 00100101111000101011110
(0.154 * 2 = 0.308( 0)
0.308 * 2 = 0.616 ( 0)
0.616 * 2 = 1.232 ( 1)
0.232 * 2 = 0.464 ( 0)
0.464 * 2 = 0.928 ( 0)
0.928 * 2 = 1.856 ( 1)
0.856 * 2 = 1.712 ( 1)
0.712 * 2 = 1.424 ( 1)
….)
to convert fraction, take the fraction part and multiply it with 2: 0.154*2 = 0.308, repeat until you reach desired precision or until the fractional part becomes 0). As per IEEE 754, for 32 bits floating point number, the fraction part has 23 bits. We can stop multiplying the fractional part after obtaining 23 bits or when it becomes 0. If it doesn’t become 0, we can truncate it round the binary representation according to the desired precision.

(For 32-bit, Bias =127; For 64-bit, Bias =1023)

4. Combining Sign, Integer binary and fraction binary:

100001.00100101111000101011110

As per IEEE 754, the binary representation of floating point should be in the format: +/-1.fraciton*2^exponent.
Move the decimal point to left until only 1 is left on left of the decimal point.
1.0000100100101111000101011110
It has been moved 5 positions to the left so the unbiased exponent is 5.

5. Calculating the biased exponent
What is biased and unbiased value?
Biased value: The biased value is a fixed value added to the actual exponent to ensure a consistent representation of the exponent range. It is specific to the floating-point format being used.

Unbiased value: The unbiased value refers to the actual value of the exponent without the bias. It represents the true power of the base (usually 2) to which the mantissa is raised.

(For 32-bit, Bias =127; For 64-bit, Bias =1023)

To get the Biased exponent, add unbiased value to 127 = 5+127 = 132

6. Convert the Biased exponent to binary
132–> 10000100

7. Combine Sign bit, biased exponent and fraction.
Sign bit 0, Exponent: 10000100, Fraction: 00001001001110110110010 (until 23 bits)

The final binary representation of 33.154 as per IEEE754 is
0 10000100 00001001001110110110010

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
An Article by: Yashwanth Naidu Tikkisetty
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

How is a floating-point number stored?

Like this:

Related

Leave a ReplyCancel reply

Published by Yashwanth Naidu Tikkisetty

Share this:

Like this:

Related

Leave a ReplyCancel reply

Published by Yashwanth Naidu Tikkisetty

Discover more from Cosmic Writer