Float construction function is lossy

jar1 · September 29, 2023, 10:55pm

julia> let x = 2^53 + 1
           Int(float(x)) - x
       end
-1

How do we feel about float(x) being lossy? That behavior is not documented. I’m wondering if it should throw an error.

Oscar_Smith · September 29, 2023, 11:05pm

There really isn’t a good alternative. Do you want 1/3 to error? If so, how do you want people to construct the closest Float64 to 1//3? Should people be required to write 6004799503160661/18014398509481984

Benny · September 30, 2023, 12:33am

IIRC Float64’s unsigned significand implies 1 bit for the leading 1 and stores the next 52 bits, ranging 2^52:2^53-1. Smaller positive integers and larger integers with enough trailing 0s can be perfectly represented by this significand range along with the exponent.

So squinting at this example, you provided an integer with too many bits between the leading and trailing 1s for Float64 to store losslessly. -2^53:2^53 is all safe, e.g. 2^52+1 to make all stored bits 0s except the trailing bit.

jar1 · September 30, 2023, 8:37pm

What about with convert? I’m less comfortable with losing information implicitly in convert than in an explicit call to float.

julia> x = 2^53+1
9007199254740993

julia> Int(convert(Float64, x))
9007199254740992

Sukera · September 30, 2023, 9:14pm

That is also explicitly documented to be lossy for AbstractFloat types:

help?> convert
search: convert const collect

  convert(T, x)

  Convert x to a value of type T.

[...]

  If `T` is a `AbstractFloat` type, then it will return the closest value to `x` representable by `T`.

[...]

jar1 · September 30, 2023, 9:20pm

Documented yes but good?

Benny · September 30, 2023, 11:14pm

convert and float both end up doing the same Float64(x) call anyway, and all methods with implicit promotion do a T(x) call at some point. The T(x) call is where an InexactError would happen for integers.

Floating points are designed to continue through precision loss. It doesn’t make much sense (and is too late for both Julia v1 and IEEE-754) to throw errors in one specific case. I don’t think anybody is really “comfortable,” we’d all prefer if we never lost precision, but that’s not possible in memory with fixed size.

If you really must check that the lossy methods didn’t lose precision, you could try afterwards. It’s easy in this case:

julia> function exactfloat(x::Integer)
         xf = float(x)
         x == xf ? xf : throw(InexactError(:exactfloat, typeof(xf), x))
       end
exactfloat (generic function with 1 method)

julia> exactfloat(2^53+1)
ERROR: InexactError: exactfloat(Float64, 9007199254740993)
...

Admittedly, that error message isn’t quite right, but point is you can do anything after the check, including throwing an error.

Topic		Replies	Views
Should InexactError be thrown when converting from Int64 to Float64? General Usage exception , float , rounding	4	174	January 9, 2025
Incorrect result from `floor` function General Usage	10	628	November 30, 2021
Testing for lossless <:Integer -> <:AbstractFloat conversion General Usage question	2	384	September 21, 2018
Quick probably basic question on Floats General Usage question , float	5	858	August 2, 2020
Why isn't Float32 == Float64 (Converting from Float32 to Float64) General Usage question	5	665	July 2, 2020

Float construction function is lossy

Related topics