關於IEEE754二進位制浮點數算術標準的介紹

阿新 • • 發佈：2019-02-07

Single-precision 32 bit

A single-precision binary floating-point number is stored in 32 bits.

Bit values for the the IEEE 754 32bit float 0.15625

The exponent is biased by 2⁸⁻¹− 1 = 127 in this case (Exponents in the range −126 to +127 are representable. See the above explanation to understand why biasing is done). An exponent of

−127 would be biased to the value 0 but this is reserved to encode that the value is a denormalized number or zero. An exponent of 128 would be biased to the value 255 but this is reserved to encode an infinity or not a number (NaN). See the chart above.

For normalised numbers, the most common, exponent

is the biased exponent and fraction is the significand minus the most significant bit.

The number has value v:

v = s × 2^e× m

Where

s = +1 (positive numbers) when the sign bit is 0

s = −1 (negative numbers) when the sign bit is 1

e = Exp − 127 (in other words the exponent is stored with 127 added to it, also called "biased with 127")

m = 1.fraction in binary (that is, the significand is the binary number 1 followed by the radix point followed by the binary bits of the fraction). Therefore, 1 ≤ m < 2.

In the example shown above, the sign is zero, the exponent is −3, and the significand is 1.01 (in binary, which is 1.25 in decimal). The represented number is therefore +1.25 × 2⁻³, which is +0.15625.

Notes:

1.Denormalized numbers are the same except that e = −126 and m is 0.fraction. (e is NOT −127 : The fraction has to be shifted to the right by one more bit, in order to include the leading bit, which is not always 1 in this case. This is balanced by incrementing the exponent to −126 for the calculation.)

2.−126 is the smallest exponent for a normalized number

3.There are two Zeroes, +0 (s is 0) and −0 (s is 1)

4.There are two Infinities +∞ (s is 0) and −∞ (s is 1)

5.NaNs may have a sign and a fraction, but these have no meaning other than for diagnostics; the first bit of the fraction is often used to distinguish signaling NaNs from quiet NaNs

6.NaNs and Infinities have all 1s in the Exp field.

7.The positive and negative numbers closest to zero (represented by the denormalized value with all 0s in the Exp field and the binary value 1 in the Fraction field) are

±2⁻¹⁴⁹≈ ±1.4012985×10⁻⁴⁵

8.The positive and negative normalized numbers closest to zero (represented with the binary value 1 in the Exp field and 0 in the fraction field) are

±2⁻¹²⁶≈ ±1.175494351×10⁻³⁸

9.The finite positive and finite negative numbers furthest from zero (represented by the value with 254 in the Exp field and all 1s in the fraction field) are

±((1-(1/2)²⁴)2¹²⁸) ^[2]≈ ±3.4028235×10³⁸

Here is the summary table from the previous section with some example 32-bit single-precision examples:

Type	Exponent	Significand	Value
Zero	0000 0000	000 0000 0000 0000 0000 0000	0.0
One	0111 1111	000 0000 0000 0000 0000 0000	1.0
Denormalized number	0000 0000	100 0000 0000 0000 0000 0000	5.9×10^-39
Large normalized number	1111 1110	111 1111 1111 1111 1111 1111	3.4×10³⁸
Small normalized number	0000 0001	000 0000 0000 0000 0000 0000	1.18×10^-38
Infinity	1111 1111	000 0000 0000 0000 0000 0000	Infinity
NaN	1111 1111	non zero	NaN

A more complex example

Bit values for the IEEE 754 32bit float -118.625

Let us encode the decimal number −118.625 using the IEEE 754 system.

1.First we need to get the sign, the exponent and the fraction. Because it is a negative number, the sign is "1".

2.Now, we write the number (without the sign; i.e. unsigned, no two's complement) using binary notation. The result is 1110110.101.

3.Next, let's move the radix point left, leaving only a 1 at its left: 1110110.101 = 1.110110101 × 2⁶. This is a normalized floating point number. The fraction is the part at the right of the radix point, filled with 0 on the right until we get all 23 bits. That is 11011010100000000000000.

4.The exponent is 6, but we need to convert it to binary and bias it (so the most negative exponent is 0, and all exponents are non-negative binary numbers). For the 32-bit IEEE 754 format, the bias is 127 and so 6 + 127 = 133. In binary, this is written as 10000101.

關於IEEE754二進位制浮點數算術標準的介紹

關於IEEE754二進位制浮點數算術標準的介紹

js浮點數算術出現多為小數

談談JavaScript的算數運算、二進位制浮點數舍入誤差及比較、型別轉換和變數宣告提前問題

js中二進位制浮點數和四捨五入錯誤

計算機IEEE754轉浮點數

-1.1的浮點數表示（IEEE754標準）

計算機基礎——IEEE754標準的浮點數的轉化

C/C++浮點數格式——IEEE754標準

浮點數在計算機中的二進位制表示（IEEE 754 標準）

浮點數的二進位制表示(IEEE 754標準)

IEEE754標準浮點數表示與舍入

IEEE754浮點數即其加法

計算機中浮點數的表示，IEEE 754標準

C，浮點數轉二進位制數（正負數均可）

python十進位制和二進位制的轉換 (含浮點數)

浮點數轉換二進位制

浮點數 IEEE754

浮點數與IEEE754

IEEE754浮點數轉換

單精度浮點數的二進位制表示中，為什麼指數的表示要與127相加作為結果？

關於IEEE754二進位制浮點數算術標準的介紹

相關推薦