Subtract Float32 number from Float64 number - what's the rule?

stevengj · August 29, 2023, 12:22pm

In particular, 1.0f-45 is internally rounded to a “subnormal” Float32 value that effectively has only a single significant digit in binary, so it’s actually 2^{-149}

julia> 2^-149
1.401298464324817e-45

(In single precision, it prints as 1.0f-45 because that’s the shortest decimal value that rounds to the same Float32 value. This is fundamental to how floating-point numbers are printed in decimal.)

If you try to make the number any smaller in single precision, it underflows to zero:

julia> 1.0f-45 / 2
0.0f0

If you want to use Float32 arithmetic with the full ≈7-digit precision, you need to stay between \approx 10^{-38} and \approx 10^{38}:

julia> floatmin(Float32), floatmax(Float32)
(1.1754944f-38, 3.4028235f38)

In fact, for performance reasons, you sometimes want to just treat subnormal values as zero:

julia> 3.0f-45 / 2 # yields subnormal
1.0f-45

julia> set_zero_subnormals(true)
true

julia> 3.0f-45 / 2 # now underflows to zero
0.0f0

Topic		Replies	Views
Quick probably basic question on Floats General Usage question , float	5	849	August 2, 2020
Why isn't Float32 == Float64 (Converting from Float32 to Float64) General Usage question	5	660	July 2, 2020
Converting Float32 values back to Float64 without introducing nonzero digits at the end? General Usage	2	528	May 3, 2024
Incorrect summation of Float64 General Usage	8	804	January 23, 2023
PSA: floating-point arithmetic Offtopic	14	7515	July 8, 2021

Subtract Float32 number from Float64 number - what's the rule?

Related topics