recently I was approached by a colleague who just got started with Julia and ran into a problem that took him ages to debug. We then seat down together and could come up with the following MWE:
julia> 0 in Set(-0)
True
julia> 0.0 in Set(-0.0)
False
Although, unexpected from a mathematical point of view, the result is plausible if you look into the implementation: The elements of a Set{T} are stored as the keys of a Dict{T,Nothing}. Using in then boils down to hashing the keys and comparing those. And since bitstring(0) == bitstring(-0) and bitstring(0.0) != bitstring(-0.0) we get the above result.
Can this be considered bug? Or is it assumed that users of Set should know that comparisons are done using hash instead of isequal?
Unfortunately, the docstring of Set does not say anything about his behavior: Collections and Data Structures ¡ The Julia Language
Workaround (not recommended, because of type piracy): Collect the keys and use in for Vectors which falls back to ==:
Base.in(x, s::Set{T}) where T<:Real = x in collect(keys(s.dict))
But I am not sure whether this can be added without breaking anything else within Julia.
Tested with versioninfo():
julia> versioninfo()
julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.5.0)
CPU: Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, haswell)
The IEEE 754 standard for floating-point arithmetic (presently used by most computers and programming languages that support floating-point numbers) requires both +0 and â0.
With the above links I should rephrase my question:
Can this be considered bug? Or is it assumed that users of Set should know that comparisons are done using hashinstead of isequal and isequal (applied to the hashes)?
The first one can be answered by no and the second is answered by lawless-mâs reply.
Regarding the âworkaroundâ:
Base.in(x, s::Set{T}) where T<:Real = x in collect(keys(s.dict))
This does not fall back to isapprox, but instead to == which can be checked with @edit 0.0 in [-0.0].
And == is not the same as isequal in the case of floating point numbers, which I wasnât aware of. See also (Essentials ¡ The Julia Language)
Similar to ==, except for the treatment of floating point numbers and of missing values.
Note that youâre defining a method for a function you didnât define, for types you donât own (i.e. not your own types). This can (will) lead to unexpected behavior for users of your code, since that has non-local effects on how the code of others behaves when your code is loaded as well. This is called âtype piracyâ and (if required to solve a problem) is generally a bad idea for the reasons mentioned above.
Besides the fact that this is type piracy, the performance of this will be truly awful.
You could define a wrapper type for your keys that impelments a custom hash and isequal which ignore the sign of zero. This also avoids type piracy. For example:
julia> s = Set(FloatKey.([-0.0, 0.0, 1.0, 2.0]))
Set{FloatKey{Float64}} with 3 elements:
FloatKey{Float64}(0.0)
FloatKey{Float64}(2.0)
FloatKey{Float64}(1.0)
julia> 0.0 in s
true
julia> -0.0 in s
true
julia> 3.0 in s
false
julia> 1.0 in s
true