Should custom hash functions distinguish types?

Suppose my package define two struct types A and B, each containing just a single field called data. Then I overload Base.hash as follows:

import Base: hash

hash(a::A, i) = hash(a.data, i)

hash(b::B, i) = hash(b.data, i)

This is OK for my own use cases, since I use dictionaries containing only keys of type A or only keys of type B. However, a future user of the package may want to construct a Dict{Union{A, B}, T}, where the keys can be of either type. This becomes problematic, because the the hash function doesn’t distinguish the two types, so the hashes will collide whenever a.data and b.data are the same.

Is there any style guide or informal advice regarding such practices with custom hash functions?

It depends on whether you consider the type important for equality or not. If you do, you have to incorporate the type into the hash as well, if you don’t, you don’t need to.

1 Like

Hash equality does not imply object equality, see docstrings for hash and isequal, for example.

If the hash don’t distinguish between your A and B types above that only means a higher probability for hash collisions. It is still fine to have A(1) and B(1) in the dictionary; they will hash to the same value, but then isequal will distinguish them.

julia> struct A
           data::Int
       end
       Base.:(==)(a1::A, a2::A) = a1.data == a2.data
       Base.hash(a::A, h::UInt) = hash(a.data, h)

       struct B
           data::Int
       end
       Base.:(==)(b1::B, a2::B) = b1.data == b2.data
       Base.hash(b::B, h::UInt) = hash(b.data, h)

julia> a = A(1); b = B(1);

julia> hash(a) == hash(b)
true

julia> a == b
false

julia> Set((a, b))
Set{Any} with 2 elements:
  A(1)
  B(1)
1 Like