Inconsistent hashing of types containing `Set` or `Array` fields


#1

Consider

struct TestType
    S::Set{Int}
end

t1 = TestType(Set([1,2]))
t2 = TestType(Set([1,2]))

Then I do

julia> hash(t1)
0x9e50ff0d6c1bbf02

julia> hash(t2)
0x8a1b971008dc34e8

Well, that’s kind of annoying, but at this point I’m starting to imagine reasons why this might be the case, but then I do

julia> hash(t1.S)
0xb612ae5b24500075

julia> hash(t2.S)
0xb612ae5b24500075

I’ve now discovered that this happens with types containing Array (only custom types of course! I didn’t notice this until I created a few and tested them). Since that’s a pretty important use case, I’m imagining there is a good reason for this behavior. Any idea what that might be?


#2

The fallback method for hash() just uses object_id. Try:

@edit hash(t1, UInt(0))

Even though t1 and t2 have the same contents, they are different non-bits structs so they have different identities. You might not see this effect with a bitstype struct because identical bitstype structs seem to have the same object_id():

julia> struct BitsType
         x::Int
       end

julia> b = BitsType(1)
BitsType(1)

julia> @edit hash(b, UInt(0))

julia> Base.object_id(b)
0x0df673c5a50bcfa2

julia> Base.object_id(BitsType(1))
0x0df673c5a50bcfa2

#3

At a slightly higher level:

Your custom object could care more about behavior than contents. In your example, t1 and t2 are completely independent, and can — in the future — diverge. This means that they’re not programmatically indistinguishable… so we take the conservative route and say that they’re neither == nor do they hash equal.

If you want them to behave as their contents do, then you need to decide which fields are important in their equality. It may be that not all fields are the same.

julia> struct TestType
           S::Set{Int}
       end

julia> t1 = TestType(Set([1,2]))
TestType(Set([2, 1]))

julia> t2 = TestType(Set([1,2]))
TestType(Set([2, 1]))

julia> t1 == t2
false

julia> push!(t1.S, 3)
Set([2, 3, 1])

julia> t1, t2
(TestType(Set([2, 3, 1])), TestType(Set([2, 1])))

#4

I guess my concept of == needs adjusting. I was thinking of it as being like mathematical = wherever possible. What you guys are talking about sounds more like === to me. Just because two objects may not be equal at some future t > t_{0} doesn’t mean that they are not equal at t_{0}, so I’d expect == to reflect this.

I understand that performance could get to be an issue here for hashing of large data, but it seems to me that most users would be conscious of this, so again, I was expecting default hashing to respect the naive concept of equality, and that it would be up to users to replace the hash function if it becomes too computationally intensive.


#5

Welcome to https://github.com/JuliaLang/julia/issues/4648.


#6

Oh god, an open issue from 2013 :fearful:. I can see that I’ve walked into a minefield :laughing:.