Set inconsistencies with structs

Hi all,

I was working with sets and self defined structs and noticed the following:

julia> struct A a end
       Base.:(==)(a::A,b::A) = true
       set = Set{A}()
       push!(set, A(1))
Set{A} with 1 element:
  A(1)

julia> A(2) in set
true

julia> A(7) in set
false

My expectation would have been for the last two lines to be consistent.
While experimenting with this I found that the problem is with different hashes being generated for different objects. It is apparent from the definition of Set that an equivalence relation is required to compare different objects in the Set. This equivalence relation seems to be:

a~b \Leftrightarrow (a == b && hash(a) == hash(b))

as shown by:

julia> struct A a end
       Base.:(==)(a::A,b::A) = true
       Base.hash(a::A) = UInt(1)   
       set = Set{A}()
       push!(set, A(1))
Set{A} with 1 element:
  A(1)

julia> A(2) in set
true

julia> A(7) in set
true

This leads to undefined behaviour when using self defined structs with custom Base.:(==) in Sets or Dicts.
Is this intentional?

Yes, as per the docstrings for hash/==/isequal:

help?> isequal
search: isequal issetequal

  isequal(x, y)

  Similar to ==, except for the treatment of floating point numbers and of missing
  values. isequal treats all floating-point NaN values as equal to each other,
  treats -0.0 as unequal to 0.0, and missing as equal to missing. Always returns a
  Bool value.

  Implementation
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  The default implementation of isequal calls ==, so a type that does not involve
  floating-point values generally only needs to define ==.

  isequal is the comparison function used by hash tables (Dict). isequal(x,y) must
  imply that hash(x) == hash(y).

  This typically means that types for which a custom == or isequal method exists
  must implement a corresponding hash method (and vice versa). Collections
  typically implement isequal by calling isequal recursively on all contents.

  Scalar types generally do not need to implement isequal separate from ==, unless
  they represent floating-point numbers amenable to a more efficient
  implementation than that provided as a generic fallback (based on isnan,
  signbit, and ==).
help?> hash
search: hash hasmethod haskey hasfield hasproperty skipchars Threads MathConstants

  hash(x[, h::UInt])

  Compute an integer hash code such that isequal(x,y) implies hash(x)==hash(y).
  The optional second argument h is a hash code to be mixed with the result.

  New types should implement the 2-argument form, typically by calling the
  2-argument hash method recursively in order to mix hashes of the contents with
  each other (and with h). Typically, any type that implements hash should also
  implement its own == (hence isequal) to guarantee the property mentioned above.
  Types supporting subtraction (operator -) should also implement widen, which is
  required to hash values inside heterogeneous arrays.
2 Likes

Thanks for your answer.
I am actually more confused now.

Compute an integer hash code such that isequal(x,y) implies hash(x)==hash(y).

This part of the documentation from hash() sounds to me like changing the (==) function should automatically change the hash function as well.
Also:

Typically, any type that implements hash should also
  implement its own == (hence isequal) to guarantee the property mentioned above.

this explains, that changing hash() also requires you to change (==). But nowhere is the inverse mentioned.

No, that is up to the programmer. That’s kind of the point, it cannot be done automatically.

The documentation of == does say:

If your type will be used as a dictionary key, it should therefore also implement hash.

but I agree that could be more clearly and prominently stated.

What phrasing of the documentation of == would have helped you avoid the misunderstanding? If we can come up with something better, then we can improve the docs and hopefully save the next person from going through the same trouble you did.

2 Likes