Why does the hash() method produce the same hash after a field value changes?

Consider the following code:

mutable struct Person
    name::String
    age::Int32
end

svetlana = Person("Queen Svetlana",25)
println(hash(svetlana)) # 1143118284

svetlana.age = 30
println(hash(svetlana)) # 1143118284
  1. Why does hash() produce the same hash even after the value of the age field changes?
  2. Does Set() use hashes to find objects? Even if you don’t implement your own equals() method, it’s still able to find it.
svetlana = Person("Queen Svetlana",25)
people = Set{Person}([svetlana])
println(svetlana in people) # true

I’m also able add two objects, that on the surface look the same but produce two different hashes, to a Set():

svetlana = Person("Queen Svetlana",25)
svetlana_copy = Person("Queen Svetlana",25)
people = Set{Person}([svetlana,svetlana_copy])
println(hash(svetlana)) # 1669566213
println(hash(svetlana_copy)) # 2683020012

ABOUT QUESTION 1:

The stack seems to be:

hash(x::Any) = hash(x, zero(UInt))
hash(@nospecialize(x), h::UInt) = hash_uint(3h - objectid(x))
objectid(@nospecialize(x)) = ccall(:jl_object_id, UInt, (Any,), x)

# C code
JL_DLLEXPORT uintptr_t jl_object_id(jl_value_t *v) JL_NOTSAFEPOINT
{
    return jl_object_id_(jl_typeof(v), v);
}

JL_DLLEXPORT inline uintptr_t jl_object_id_(jl_value_t *tv, jl_value_t *v) JL_NOTSAFEPOINT
{
    jl_datatype_t *dt = (jl_datatype_t*)tv;
    if (dt == jl_symbol_type)
        return ((jl_sym_t*)v)->hash;
    if (dt == jl_typename_type)
        return ((jl_typename_t*)v)->hash;
    if (dt == jl_datatype_type) {
        jl_datatype_t *dtv = (jl_datatype_t*)v;
        if (dtv->isconcretetype)
            return dtv->hash;
    }
    return jl_object_id__cold(dt, v);
}

It seems to me that a hash is created when the object is created, and it is never updated after. I checked and creating the object directly with field age of value 30 leads to a different hash:

julia> hash(svetlana)
0x4589effdd2b8230e

julia> hash(svetlana2)
0x9aa4ae20c2343a5a

This behavior seems optimal for immutable types, but not for mutable ones.

You should probably define your own hash function, or use something like AutoHashEquals.jl which has a macro that takes the struct definition and automatically defines hash and isequal for it.

ABOUT QUESTION 2:

Yes, Set probably uses hash (or isequal) and you should not use mutable structs as keys of hashing containers (in the Set the element is both value and key) unless you are sure they will not change, or is prepared to deal with them changing.

In the case you change them, they will probably keep stored by their old hash value, and if you search for them with an object created with the new values, then you will not find the stored objects. It can become impossible to find them by their old value also, because they are not only searched by hash, but compared by value before being returned.

3 Likes

Right, hash by default just uses the object ID. That’s consistent with the default behavior of == which falls back to === and thus also uses the object ID. The important invariant is that x == y implies hash(x) == hash(y), and that invariant holds here.

3 Likes

So, == and hash should always be specialized for some type together? To avoid breaking the invariant, I mean.

I would assume so, the documentation says if you implement a custom equals() you should also implement a custom hash().

Although someone might be able to provide more details.

Is it possible to detect if there are any types that violate this condition in my code?

(a) use AutoHashEquals.jl
(b) Best to review your own code and make certain that you define the custom equals in terms of the custom hash for each of your own types.
(c) Otherwise, the only way to be certain would be to test every possible value of each type.

1 Like

Isn’t it possible to look at all types that have a custom equals and check that they have a custom hash?

You need to assure that equals(x,y) === (hash(x) === hash(y)). The types may have a custom equals and a custom hash without that condition holding.

Yeah it would have to be a heuristic.