Inconsistent hashing of types containing `Set` or `Array` fields

ExpandingMan · March 22, 2018, 4:11pm

Consider

struct TestType
    S::Set{Int}
end

t1 = TestType(Set([1,2]))
t2 = TestType(Set([1,2]))

Then I do

julia> hash(t1)
0x9e50ff0d6c1bbf02

julia> hash(t2)
0x8a1b971008dc34e8

Well, that’s kind of annoying, but at this point I’m starting to imagine reasons why this might be the case, but then I do

julia> hash(t1.S)
0xb612ae5b24500075

julia> hash(t2.S)
0xb612ae5b24500075

I’ve now discovered that this happens with types containing Array (only custom types of course! I didn’t notice this until I created a few and tested them). Since that’s a pretty important use case, I’m imagining there is a good reason for this behavior. Any idea what that might be?

rdeits · March 22, 2018, 4:16pm

The fallback method for hash() just uses object_id. Try:

@edit hash(t1, UInt(0))

Even though t1 and t2 have the same contents, they are different non-bits structs so they have different identities. You might not see this effect with a bitstype struct because identical bitstype structs seem to have the same object_id():

julia> struct BitsType
         x::Int
       end

julia> b = BitsType(1)
BitsType(1)

julia> @edit hash(b, UInt(0))

julia> Base.object_id(b)
0x0df673c5a50bcfa2

julia> Base.object_id(BitsType(1))
0x0df673c5a50bcfa2

mbauman · March 22, 2018, 4:24pm

At a slightly higher level:

Your custom object could care more about behavior than contents. In your example, t1 and t2 are completely independent, and can — in the future — diverge. This means that they’re not programmatically indistinguishable… so we take the conservative route and say that they’re neither == nor do they hash equal.

If you want them to behave as their contents do, then you need to decide which fields are important in their equality. It may be that not all fields are the same.

julia> struct TestType
           S::Set{Int}
       end

julia> t1 = TestType(Set([1,2]))
TestType(Set([2, 1]))

julia> t2 = TestType(Set([1,2]))
TestType(Set([2, 1]))

julia> t1 == t2
false

julia> push!(t1.S, 3)
Set([2, 3, 1])

julia> t1, t2
(TestType(Set([2, 3, 1])), TestType(Set([2, 1])))

ExpandingMan · March 22, 2018, 4:49pm

I guess my concept of == needs adjusting. I was thinking of it as being like mathematical = wherever possible. What you guys are talking about sounds more like === to me. Just because two objects may not be equal at some future t > t_{0} doesn’t mean that they are not equal at t_{0}, so I’d expect == to reflect this.

I understand that performance could get to be an issue here for hashing of large data, but it seems to me that most users would be conscious of this, so again, I was expecting default hashing to respect the naive concept of equality, and that it would be up to users to replace the hash function if it becomes too computationally intensive.

StefanKarpinski · March 22, 2018, 4:49pm

Welcome to https://github.com/JuliaLang/julia/issues/4648.

ExpandingMan · March 22, 2018, 4:51pm

Oh god, an open issue from 2013 . I can see that I’ve walked into a minefield .

Topic		Replies	Views
Why does the hash() method produce the same hash after a field value changes? General Usage	9	548	June 2, 2021
Set inconsistencies with structs General Usage question	4	480	May 6, 2021
Weird hash behavior General Usage question	4	683	May 22, 2017
Custom hash not called for field in another struct – expected? General Usage question	4	67	July 19, 2025
Hashing of different types with same contents General Usage	2	289	February 8, 2021

Inconsistent hashing of types containing `Set` or `Array` fields

Related topics