home | 163 . 202 . 244 . 253 . 355 | courses | advisees | faq | honesty | jupiter files | schedule | webpage help | unix help | c++ help | vb help | assembly help

   floating point

Using the binary number system we know:
   1 is 2^0 = 1
  10 is 2^1 = 2 
 100 is 2^2 = 4

To express fractional numbers, the scheme of the binary positional number system is simply extended. Numbers to the right of the "binary point" simply have negative exponents.
  .1   is 2^-1 = 1/2 (.5)
  .01  is 2^-2 = 1/4 (.25)
  .001 is 2^-3 = 1/8 (.125)
   
  So, 101.11 = 5.75 [ 4 + 1 + .5 + .25]
   
Binary Decimal Description
0.1 0.5 a half
0.01 0.25 a quarter
0.001 0.125 an eighth
0.0001 0.0625 a sixteenth
0.00001 0.03125 a thirtysecond
0.000001 0.015625 a sixtyfourth

IEEE 754

IEEE 754 uses this representation as a basis for its scheme to represent floating point numbers. (see wikipedia ieee 754-1985)
   It assigns bit fields to 3 things:
      1. sign     (1 bit)  0 positive, 1 negative
      2. exponent (8 bits) unsigned, bias 127 (stored as exponent + 127)
      3. fraction (23 bits) normalized: leading one not represented
   ieee 754
   (image from wikipedia)
   
In the example shown above, the sign is zero so sign is positive, the exponent is 124 so becomes -3 (124-127), and the fraction (significand or mantissa) (.0100...) becomes 1.01 when the leading 1 is supplied
   So we have +1.01 x 2^-3   

   Shifting the binary point left 3 places, 1.01 becomes .001012
 
   In decimal, then, the represented number is: .125 + .03125, 
   which is +0.1562510 

 
   OR (another way to approach the conversion)
   ==

   1.012 = 1.2510 x 2^-3 (.125)
           1.25 x .125 = .1562510
   
   The represented number is therefore: +1.25 x 2^-3, 
   which is +0.1562510
   

Valid XHTML 1.0 Transitional