CSAPP Float Point

Float Point

Fractional Binary Numbers

  • Representation
    • Bits to right of "binary point" represent fractional powers of \(2\)
    • Represents rational number:

      \[\sum_{k=-j}^i b_k \times 2^k \]

we can represent any fractional binary number

Fractional Binary Numbers: Examples

Observations
  • Divide by 2 by shifting right (unsigned)
  • Multiply by 2 by shifting left
  • Numbers of form 0.111111…2 are just below 1.0
    • Use notation \(1.0^{-\varepsilon}\)
      \(\varepsilon\) depends how many bits you have to the right of binary point

Representable Numbers

Limitation #1

Can only exactly represent numbers of the form
\(\frac{x}{2^k}\)
example:
\(1/3\) Representation: \({0.0101010101[01]…}_2 \)
\(1/5\) Representation: \({0.001100110011[0011]…}_2 \)

Limitation #2

Just one setting of binary point within the \(w\) bits
Limited range of numbers:

  • binary point shift right \(\rightarrow\) range of numbers \(\uparrow\)
  • binary point shift left \(\rightarrow\) range of fractional binary numbers \(\uparrow\)

IEEE Floating Point

Floating Point Representation
  • Numerical Form: \((-1)^s M 2^E\)
    • Sign bit \(s\) determines whether number is negative or positive
    • Significand \(M\)(mantissa) normally a fractional value in range \([1.0,2.0)\).
    • Exponent \(E\) weights value by power of two
  • Encoding
    • MSB s is sign bit \(s\)
    • exp field encodes \(E\) (but is not equal to E)
    • frac field encodes \(M\) (but is not equal to M)
Precision options
  • Single precision: 32 bits
    \(s\):1 bit
    \(exp\): 8 bit
    \(frac\): 23 bit
  • Double precision: 64 bits
    \(s\): 1 bit
    \(exp\): 11 bit
    \(frac\): 52 bit
"Normalized" Values
  • When: exp \(\not =\) \(000…0\) and exp \(\not =\) \(111…1\)
  • Exponent coded as a biased value: E = Exp – Bias(7 unsigned numbers)
    • Exp: unsigned value of exp field(we can compare two float numbers using Exp directly because of the unsigned value)
    • Bias = \(2^{k-1} - 1\),where \(k\) is number of exponent bits
      Single precision: 127 (Exp: 1…254, E: -126…127)(don't have 000..0 or 111..1)
      Double precision: 1023 (Exp: 1…2046, E: -1022…1023)(don't have 000..0 or 111..1)
  • Significand coded with implied leading 1: M = 1.xx..x2
  • xxx…x: bits of frac field(1 is drop,because we want a bit for free)
  • Minimum when frac=000…0 (M = 1.0)
posted @ 2021-02-18 00:34  strategist_614  阅读(40)  评论(0编辑  收藏  举报