I’m running into a problem with Dict
and I think it could be a bug? Julia’s documentation says that the implementation of Dict
uses hash
as the hashing function for the key, and isequal
to determine equality, but the following case seems to behave against the spec.
struct Node
children::Vector{Node}
end
node() = Node([])
children(n::Node) = n.children
Base.:(==)(n_1::Node, n_2::Node) = children(n_1) == children(n_2)
Base.hash(n::Node) = sum(hash, children(n); init=UInt64(1))
n_1 = node()
n_2 = node()
n_1 == n_2 # true
isequal(n_1, n_2) # true
hash(n_1) == hash(n_2) #true
d = Dict(n_1 => 1)
d[n_2] # KeyError
Stacktrace
ERROR: KeyError: key Node(Node[]) not found
Stacktrace:
[1] getindex(h::Dict{Node, Int64}, key::Node)
@ Base ./dict.jl:498
[2] top-level scope
@ REPL[37]:1
Julia Version: 1.10.0
Since the two nodes have the same hash and are equal, one expects looking up n_2
should work and output 1
. I don’t know if this behavior should be expected. If it is not, the problem seems to be that comparing empty vectors can have different results depending on the context.
SInce you didn’t implement 2 arg hash in code that uses it (like Dict
) that falls back to the default hash which uses objectref, and so they hash differently
julia> hash(n_1, UInt(0))
0x416534e228463c4d
julia> hash(n_2, UInt(0))
0x65ee65d3463e8fcc
You need to implement 2 argument hash
.
(don’t implement 1 argument hash)
e.g.
Base.hash(n::Node, h::UInt) = sum(hash, children(n); init=h)
(I am a little dubious of using sum
in hash and would consider reduce(xor
, but that is another discussion)
2 Likes
That seems to work. Thanks!
Note that your hash function ignores ordering (and the type itself) whereas the equality predicate cares about it — this may cause more hash collisions than you’d otherwise expect. Rather than sum
or reduce(xor
, you might want to reduce with the hash
function itself:
Base.hash(n::Node, h::UInt) = reduce((h,x)->hash(x,h), children(n); init=hash(Node, h))
2 Likes
Thanks for pointing this out. I’m not sure what ordering means here. Does it mean something like hash(n_1) < hash(n_2)
implies n_1 < n_2
?
No, I mean the positioning of the child nodes in the .children
vector. Your equality predicate defers to vector equality — which cares not just about the elements, but also the positions in which they appear. Taking the sum
(or xor
) of all the children’s hashes is a commutative operation, so it doesn’t care about positioning. Reducing over hash
fixes that… but in fact, you might as well just defer to the array’s hash implementation instead of re-inventing the wheel. Then it’s just:
Base.hash(n::Node, h::Uint) = hash(n.children, hash(Node, h))
The ordering makes sense and the hash does look like a better idea.