Subtract Float32 number from Float64 number - what's the rule?

In particular, 1.0f-45 is internally rounded to a “subnormal” Float32 value that effectively has only a single significant digit in binary, so it’s actually 2^{-149}

julia> 2^-149
1.401298464324817e-45

(In single precision, it prints as 1.0f-45 because that’s the shortest decimal value that rounds to the same Float32 value. This is fundamental to how floating-point numbers are printed in decimal.)

If you try to make the number any smaller in single precision, it underflows to zero:

julia> 1.0f-45 / 2
0.0f0

If you want to use Float32 arithmetic with the full ≈7-digit precision, you need to stay between \approx 10^{-38} and \approx 10^{38}:

julia> floatmin(Float32), floatmax(Float32)
(1.1754944f-38, 3.4028235f38)

In fact, for performance reasons, you sometimes want to just treat subnormal values as zero:

julia> 3.0f-45 / 2 # yields subnormal
1.0f-45

julia> set_zero_subnormals(true)
true

julia> 3.0f-45 / 2 # now underflows to zero
0.0f0
9 Likes