Floating-point
IEEE 754 encoding with sign, characteristic (Excess-q) and mantissa (fixed-point). Constant relative error.
Configure your inputs and press Compute to see the step-by-step computation.
How it works
Floating-point cGK,k,n encodes a real in normalised form m * 2e: 1 sign bit, n-k characteristic bits (Excess-q with q = 2n-k-1 - 1), k-1 mantissa bits (fixed-point without leading 1). Reserved bit patterns for zero, ±infinity, NaN and subnormals. Absolute rounding error grows with the exponent, but relative error stays bounded by 2-k.
Rounding error
For floating-point encoding the absolute error grows with the exponent, but the relative error stays bounded by a constant — equal precision across all magnitudes.
- Maximum absolute error
- 2e / 2k (worst case bei e = q = 1: ≈ 0.03125)
- Maximum relative error
- 1 / 2k = 1 / 26 = 0.015625 (konstant)
Press "Compute" to see the actual error of your current input.
When to use
In C provided as float (binary32, k=24, n=32) and double (binary64, k=53, n=64). Because of catastrophic cancellation and non-associative arithmetic, sum numbers grouped by magnitude. Always compare with a tolerance instead of ==.