Different Dicts hashing to the same value

samuel-massinon · January 12, 2017, 4:09pm

While trying to do a puzzle problem, I notice that two different Dicts resulted in the same hash(...) value, even though the Dicts are not equivalent, as shown below.

julia> VERSION
v"0.5.0"

julia> a = Dict(Dict(3 => 4, 2 => 3) => 2, Dict(1 => 2, 5 => 6) => 1)
Dict{Dict{Int64,Int64},Int64} with 2 entries:
  Dict(5=>6,1=>2) => 1
  Dict(2=>3,3=>4) => 2

julia> b = Dict(Dict(1 => 2, 2 => 3, 5 => 6) => 1, Dict(3 => 4) => 2)
Dict{Dict{Int64,Int64},Int64} with 2 entries:
  Dict(3=>4)           => 2
  Dict(2=>3,5=>6,1=>2) => 1

julia> hash(a)
0x2a9d36de3d0d9307

julia> hash(b)
0x2a9d36de3d0d9307

julia> isequal(a, b)
false

julia> c = Dict(Dict() => 1, Dict(3 => 4, 1 => 2, 5 => 6, 2 => 3) => 2)
Dict{Any,Int64} with 2 entries:
  Dict{Any,Any}()           => 1
  Dict(2=>3,3=>4,5=>6,1=>2) => 2

julia> hash(c)
0x2a9d36de3d0d9307

The same thing happens in version 0.6.0-dev.1658

Is this expected behaviour or a bug?

stevengj · January 12, 2017, 5:25pm

I think this is a bug; I filed it as https://github.com/JuliaLang/julia/issues/19995

ScottPJones · January 13, 2017, 4:41am

It’s not really a bug.
There’s an issue here, because the algorithm for calculating a hash value for the abstract Associative works by simply XORing the results of hashing the keys using the hash of the value as the seed, i.e. hash(key, hash(value)), and that seems to end up frequently with collisions.

const hasha_seed = UInt === UInt64 ? 0x6d35bb51952d5539 : 0x952d5539
function hash(a::Associative, h::UInt)
    h = hash(hasha_seed, h)
    for (k,v) in a
        h ⊻= hash(k, hash(v))
    end
    return h
end

I imagine that was done so that two dictionaries with the same contents, but different orders, would have the same hash value.

stevengj · January 13, 2017, 1:22pm

When you iterate through the (key,value) pairs in a dictionary, the order is undefined, so the hash cannot depend on this, hence the xor. However, the hash should depend on which keys go with which values. It does for most keys, but apparently not for dictionary keys, and this seems like a bug in the algorithm.

ScottPJones · January 13, 2017, 3:27pm

I guess that depends on your definition of “bug”.

To me, a bug is when you get incorrect results, or there is an inconsistency between the documentation and the results (which could be a bug in the code, the documentation, or both). An inconsistency between behavior of different types/methods where a reasonable programmer might expect them to be consistent might indicate a bug, or some possibly undocumented design decision (in which case, it’s a documentation issue).

That’s not the case here (a good example of what I felt really was a bug with Dicts was #15077, which was fixed by https://github.com/JuliaLang/julia/pull/15176).

Having these collide could potentially cause a performance issue, but the results would still be correct.

Note, that this problem / performance issue happens for all subtypes of Associative, unless they have implemented their own hash method (and haven’t copied the same flaw).

I already have a “fix” ready for this, in https://github.com/ScottPJones/julia/tree/spj/fixasshash if somebody could make it into a PR. It also has tests added.

ScottPJones · January 13, 2017, 4:00pm

Thanks very much for making a PR out of that!

ScottPJones · January 13, 2017, 11:15pm

There seems to be an unrelated failure (with a Pkg test) on AppVeyor - it had already gotten past the Dict unit tests.

Topic		Replies	Views
Same types have different hash General Usage question	1	333	November 5, 2019
Hash of Dict with custom type as keys General Usage dictionary	4	1363	September 20, 2021
Dictionary has same key more than once New to Julia dictionary	6	355	April 28, 2023
Dictionary lookup errors when the struct contains empty vectors General Usage question , dictionary	6	261	February 9, 2024
Comparing Dictionaries of Dictionaries General Usage question , dictionaries	16	458	February 17, 2024

Different Dicts hashing to the same value

Related topics