Hi all,
I was working with sets and self defined structs and noticed the following:
julia> struct A a end
Base.:(==)(a::A,b::A) = true
set = Set{A}()
push!(set, A(1))
Set{A} with 1 element:
A(1)
julia> A(2) in set
true
julia> A(7) in set
false
My expectation would have been for the last two lines to be consistent.
While experimenting with this I found that the problem is with different hashes being generated for different objects. It is apparent from the definition of Set
that an equivalence relation is required to compare different objects in the Set
. This equivalence relation seems to be:
a~b \Leftrightarrow (a == b && hash(a) == hash(b))
as shown by:
julia> struct A a end
Base.:(==)(a::A,b::A) = true
Base.hash(a::A) = UInt(1)
set = Set{A}()
push!(set, A(1))
Set{A} with 1 element:
A(1)
julia> A(2) in set
true
julia> A(7) in set
true
This leads to undefined behaviour when using self defined structs with custom Base.:(==)
in Sets or Dicts.
Is this intentional?
Yes, as per the docstrings for hash
/==
/isequal
:
help?> isequal
search: isequal issetequal
isequal(x, y)
Similar to ==, except for the treatment of floating point numbers and of missing
values. isequal treats all floating-point NaN values as equal to each other,
treats -0.0 as unequal to 0.0, and missing as equal to missing. Always returns a
Bool value.
Implementation
≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡
The default implementation of isequal calls ==, so a type that does not involve
floating-point values generally only needs to define ==.
isequal is the comparison function used by hash tables (Dict). isequal(x,y) must
imply that hash(x) == hash(y).
This typically means that types for which a custom == or isequal method exists
must implement a corresponding hash method (and vice versa). Collections
typically implement isequal by calling isequal recursively on all contents.
Scalar types generally do not need to implement isequal separate from ==, unless
they represent floating-point numbers amenable to a more efficient
implementation than that provided as a generic fallback (based on isnan,
signbit, and ==).
help?> hash
search: hash hasmethod haskey hasfield hasproperty skipchars Threads MathConstants
hash(x[, h::UInt])
Compute an integer hash code such that isequal(x,y) implies hash(x)==hash(y).
The optional second argument h is a hash code to be mixed with the result.
New types should implement the 2-argument form, typically by calling the
2-argument hash method recursively in order to mix hashes of the contents with
each other (and with h). Typically, any type that implements hash should also
implement its own == (hence isequal) to guarantee the property mentioned above.
Types supporting subtraction (operator -) should also implement widen, which is
required to hash values inside heterogeneous arrays.
2 Likes
Thanks for your answer.
I am actually more confused now.
Compute an integer hash code such that isequal(x,y) implies hash(x)==hash(y).
This part of the documentation from hash()
sounds to me like changing the (==)
function should automatically change the hash function as well.
Also:
Typically, any type that implements hash should also
implement its own == (hence isequal) to guarantee the property mentioned above.
this explains, that changing hash() also requires you to change (==)
. But nowhere is the inverse mentioned.
No, that is up to the programmer. That’s kind of the point, it cannot be done automatically.
The documentation of ==
does say:
If your type will be used as a dictionary key, it should therefore also implement hash.
but I agree that could be more clearly and prominently stated.
What phrasing of the documentation of ==
would have helped you avoid the misunderstanding? If we can come up with something better, then we can improve the docs and hopefully save the next person from going through the same trouble you did.
2 Likes