Edge case in Sets with Floats

Hi Julia Community,

recently I was approached by a colleague who just got started with Julia and ran into a problem that took him ages to debug. We then seat down together and could come up with the following MWE:

julia> 0 in Set(-0)
julia> 0.0 in Set(-0.0)

Although, unexpected from a mathematical point of view, the result is plausible if you look into the implementation: The elements of a Set{T} are stored as the keys of a Dict{T,Nothing}. Using in then boils down to hashing the keys and comparing those. And since bitstring(0) == bitstring(-0) and bitstring(0.0) != bitstring(-0.0) we get the above result.

Can this be considered bug? Or is it assumed that users of Set should know that comparisons are done using hash instead of isequal?
Unfortunately, the docstring of Set does not say anything about his behavior: Collections and Data Structures · The Julia Language

Workaround (not recommended, because of type piracy): Collect the keys and use in for Vectors which falls back to ==:

Base.in(x, s::Set{T}) where T<:Real = x in collect(keys(s.dict))

But I am not sure whether this can be added without breaking anything else within Julia.

Tested with versioninfo():

julia> versioninfo()
julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, haswell)


julia> isequal(0.0, -0.0)


The documentation for in however, does document this behaviour


For example, Set s check whether the item isequal to one of the elements.

1 Like

And I guess

in particular

The IEEE 754 standard for floating-point arithmetic (presently used by most computers and programming languages that support floating-point numbers) requires both +0 and −0.

1 Like

With the above links I should rephrase my question:

Can this be considered bug? Or is it assumed that users of Set should know that comparisons are done using hash instead of isequal and isequal (applied to the hashes)?

The first one can be answered by no and the second is answered by lawless-m’s reply.

Regarding the ‘workaround’:

Base.in(x, s::Set{T}) where T<:Real = x in collect(keys(s.dict))

This does not fall back to isapprox, but instead to == which can be checked with @edit 0.0 in [-0.0].
And == is not the same as isequal in the case of floating point numbers, which I wasn’t aware of. See also (Essentials · The Julia Language)

Similar to ==, except for the treatment of floating point numbers and of missing values.

Thank you for your answers!

Note that you’re defining a method for a function you didn’t define, for types you don’t own (i.e. not your own types). This can (will) lead to unexpected behavior for users of your code, since that has non-local effects on how the code of others behaves when your code is loaded as well. This is called “type piracy” and (if required to solve a problem) is generally a bad idea for the reasons mentioned above.

Thanks for explicitly pointing that out and (since I am already aware of type piracy) I second this. I ll update my initial post to make it clear.

Besides the fact that this is type piracy, the performance of this will be truly awful.

You could define a wrapper type for your keys that impelments a custom hash and isequal which ignore the sign of zero. This also avoids type piracy. For example:

struct FloatKey{T<:AbstractFloat}
Base.isequal(x::FloatKey, y::FloatKey) = iszero(x.val) ? x.val == y.val : isequal(x.val, y.val)
Base.hash(x::FloatKey, h::UInt) = iszero(x.val) && signbit(x.val) ? hash(-x.val, h) : hash(x.val, h)
Base.ht_keyindex(d::Dict{FloatKey{T}}, x::Real) where {T, V} = Base.ht_keyindex(d, FloatKey(T(x)))

which gives

julia> s = Set(FloatKey.([-0.0, 0.0, 1.0, 2.0]))
Set{FloatKey{Float64}} with 3 elements:

julia> 0.0 in s

julia> -0.0 in s

julia> 3.0 in s

julia> 1.0 in s