Dictionary lookup errors when the struct contains empty vectors

I’m running into a problem with Dict and I think it could be a bug? Julia’s documentation says that the implementation of Dict uses hash as the hashing function for the key, and isequal to determine equality, but the following case seems to behave against the spec.

struct Node
    children::Vector{Node}
end

node() = Node([])
children(n::Node) = n.children

Base.:(==)(n_1::Node, n_2::Node) = children(n_1) == children(n_2)
Base.hash(n::Node) = sum(hash, children(n); init=UInt64(1)) 

n_1 = node()
n_2 = node()

n_1 == n_2 # true
isequal(n_1, n_2) # true
hash(n_1) == hash(n_2) #true

d = Dict(n_1 => 1)
d[n_2] # KeyError

Stacktrace

ERROR: KeyError: key Node(Node[]) not found
Stacktrace:
 [1] getindex(h::Dict{Node, Int64}, key::Node)
   @ Base ./dict.jl:498
 [2] top-level scope
   @ REPL[37]:1

Julia Version: 1.10.0

Since the two nodes have the same hash and are equal, one expects looking up n_2 should work and output 1. I don’t know if this behavior should be expected. If it is not, the problem seems to be that comparing empty vectors can have different results depending on the context.

SInce you didn’t implement 2 arg hash in code that uses it (like Dict) that falls back to the default hash which uses objectref, and so they hash differently

julia> hash(n_1, UInt(0))
0x416534e228463c4d

julia> hash(n_2, UInt(0))
0x65ee65d3463e8fcc

You need to implement 2 argument hash.
(don’t implement 1 argument hash)

e.g.

Base.hash(n::Node, h::UInt) = sum(hash, children(n); init=h) 

(I am a little dubious of using sum in hash and would consider reduce(xor, but that is another discussion)

2 Likes

That seems to work. Thanks!

Note that your hash function ignores ordering (and the type itself) whereas the equality predicate cares about it — this may cause more hash collisions than you’d otherwise expect. Rather than sum or reduce(xor, you might want to reduce with the hash function itself:

Base.hash(n::Node, h::UInt) = reduce((h,x)->hash(x,h), children(n); init=hash(Node, h))
2 Likes

Thanks for pointing this out. I’m not sure what ordering means here. Does it mean something like hash(n_1) < hash(n_2) implies n_1 < n_2?

No, I mean the positioning of the child nodes in the .children vector. Your equality predicate defers to vector equality — which cares not just about the elements, but also the positions in which they appear. Taking the sum (or xor) of all the children’s hashes is a commutative operation, so it doesn’t care about positioning. Reducing over hash fixes that… but in fact, you might as well just defer to the array’s hash implementation instead of re-inventing the wheel. Then it’s just:

Base.hash(n::Node, h::Uint) = hash(n.children, hash(Node, h))

The ordering makes sense and the hash does look like a better idea.