Edge case in Sets with Floats

fatteneder · March 31, 2022, 8:46am

Hi Julia Community,

recently I was approached by a colleague who just got started with Julia and ran into a problem that took him ages to debug. We then seat down together and could come up with the following MWE:

julia> 0 in Set(-0)
True
julia> 0.0 in Set(-0.0)
False

Although, unexpected from a mathematical point of view, the result is plausible if you look into the implementation: The elements of a Set{T} are stored as the keys of a Dict{T,Nothing}. Using in then boils down to hashing the keys and comparing those. And since bitstring(0) == bitstring(-0) and bitstring(0.0) != bitstring(-0.0) we get the above result.

Can this be considered bug? Or is it assumed that users of Set should know that comparisons are done using hash instead of isequal?
Unfortunately, the docstring of Set does not say anything about his behavior: Collections and Data Structures · The Julia Language

Workaround (not recommended, because of type piracy): Collect the keys and use in for Vectors which falls back to ==:

Base.in(x, s::Set{T}) where T<:Real = x in collect(keys(s.dict))

But I am not sure whether this can be added without breaking anything else within Julia.

Tested with versioninfo():

julia> versioninfo()
julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, haswell)

Cheers

lawless-m · March 31, 2022, 8:58am

julia> isequal(0.0, -0.0)
false

https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Floating-point-zero

The documentation for in however, does document this behaviour

https://docs.julialang.org/en/v1.7/base/collections/#Base.in

For example, Set s check whether the item isequal to one of the elements.

nilshg · March 31, 2022, 9:02am

And I guess

in particular

The IEEE 754 standard for floating-point arithmetic (presently used by most computers and programming languages that support floating-point numbers) requires both +0 and −0.

fatteneder · March 31, 2022, 9:37am

With the above links I should rephrase my question:

Can this be considered bug? Or is it assumed that users of Set should know that comparisons are done using hash ~~instead of isequal~~ and isequal (applied to the hashes)?

The first one can be answered by no and the second is answered by lawless-m’s reply.

Regarding the ‘workaround’:

Base.in(x, s::Set{T}) where T<:Real = x in collect(keys(s.dict))

This does not fall back to isapprox, but instead to == which can be checked with @edit 0.0 in [-0.0].
And == is not the same as isequal in the case of floating point numbers, which I wasn’t aware of. See also (Essentials · The Julia Language)

Similar to ==, except for the treatment of floating point numbers and of missing values.

Thank you for your answers!

Sukera · March 31, 2022, 9:56am

Note that you’re defining a method for a function you didn’t define, for types you don’t own (i.e. not your own types). This can (will) lead to unexpected behavior for users of your code, since that has non-local effects on how the code of others behaves when your code is loaded as well. This is called “type piracy” and (if required to solve a problem) is generally a bad idea for the reasons mentioned above.

fatteneder · March 31, 2022, 11:18am

Thanks for explicitly pointing that out and (since I am already aware of type piracy) I second this. I ll update my initial post to make it clear.

stevengj · March 31, 2022, 12:25pm

Besides the fact that this is type piracy, the performance of this will be truly awful.

You could define a wrapper type for your keys that impelments a custom hash and isequal which ignore the sign of zero. This also avoids type piracy. For example:

struct FloatKey{T<:AbstractFloat}
    val::T
end
Base.isequal(x::FloatKey, y::FloatKey) = iszero(x.val) ? x.val == y.val : isequal(x.val, y.val)
Base.hash(x::FloatKey, h::UInt) = iszero(x.val) && signbit(x.val) ? hash(-x.val, h) : hash(x.val, h)
Base.ht_keyindex(d::Dict{FloatKey{T}}, x::Real) where {T, V} = Base.ht_keyindex(d, FloatKey(T(x)))

which gives

julia> s = Set(FloatKey.([-0.0, 0.0, 1.0, 2.0]))
Set{FloatKey{Float64}} with 3 elements:
  FloatKey{Float64}(0.0)
  FloatKey{Float64}(2.0)
  FloatKey{Float64}(1.0)

julia> 0.0 in s
true

julia> -0.0 in s
true

julia> 3.0 in s
false

julia> 1.0 in s
true

Topic		Replies	Views
Possible bug in unique/Set Internals & Design faq	49	4418	June 15, 2018
Comparing floating point numbers in an intuitive way fails within julia Internals & Design	6	1260	December 7, 2018
Why is Julia designed this way? NaN != NaN but -0.0 == 0.0 New to Julia question	11	1047	November 11, 2021
-0.0 == 0.0, but issorted([0.0, -0.0]) returns false General Usage sorting , comparison	4	148	July 23, 2025
Set inconsistencies with structs General Usage question	4	480	May 6, 2021

Edge case in Sets with Floats

Related topics