Zum Inhalt springenSkip to content
Number Encodings

Floating-point

IEEE 754 encoding with sign, characteristic (Excess-q) and mantissa (fixed-point). Constant relative error.

Configure your inputs and press Compute to see the step-by-step computation.

How it works

Floating-point cGK,k,n encodes a real in normalised form m * 2e: 1 sign bit, n-k characteristic bits (Excess-q with q = 2n-k-1 - 1), k-1 mantissa bits (fixed-point without leading 1). Reserved bit patterns for zero, ±infinity, NaN and subnormals. Absolute rounding error grows with the exponent, but relative error stays bounded by 2-k.

Rounding error

For floating-point encoding the absolute error grows with the exponent, but the relative error stays bounded by a constant — equal precision across all magnitudes.

Maximum absolute error
2e / 2k (worst case bei e = q = 1: ≈ 0.03125)
Maximum relative error
1 / 2k = 1 / 26 = 0.015625 (konstant)

Press "Compute" to see the actual error of your current input.

When to use

In C provided as float (binary32, k=24, n=32) and double (binary64, k=53, n=64). Because of catastrophic cancellation and non-associative arithmetic, sum numbers grouped by magnitude. Always compare with a tolerance instead of ==.