Properly implement Base.hash() for custom type

anon60034542 · May 27, 2021, 5:56pm

Consider the following code:

@enum WeaponType CloseRange Reloadable

struct GameWeapon
    damage::Int32
    type::WeaponType
end

Reading the documentation, it says:

New types should implement the 2-argument form, typically by calling the 2-argument hash method recursively in order to mix hashes of the contents with each other (and with h ).

Is this how you would implement the Base.hash() based on the documentation?

function ==(x::GameWeapon, y::GameWeapon)
    return (x.damage === y.damage) &&
           (x.type === y.type)
end

function hash(weapon::GameWeapon, h::UInt = 0x01234567)
    hash_of_weapon_attribs = hash(hash(weapon.damage,h),hash(weapon.type,h))
    return hash(hash_of_weapon_attribs,h)
end

If not, how would you do it?

rdeits · May 27, 2021, 6:02pm

I don’t think there’s any reason to supply a default for the second argument (the base hash(x::Any) method in Base will do this for you). And you can probably simplify the implementation a bit to something more like:

function hash(weapon::GameWeapon, h::UInt)
  hash(weapon.type, hash(weapon.damage, h))
end

jzr · May 27, 2021, 6:08pm

This implementation seems very generic. Is there a sugar that defines this implementation for a given struct?

anon60034542 · May 28, 2021, 5:01pm

In the documentation it says:

New types should implement the 2-argument form, typically by calling the 2-argument hash method recursively

Shouldn’t weapon.type be hashed as well?

function hash(weapon::GameWeapon, h::UInt)
  return hash(hash(weapon.type,h), hash(weapon.damage, h))
end

fredrikekre · May 28, 2021, 5:54pm

There is AutoHashEquals.jl.

It is hashed, just not with h as seed but instead using the result of hash(weapon.damage, h) as the seed.

FWIW, AutoHashEquals.jl would define:

function Base.hash(g::GameWeapon, h::UInt)
    hash(g.type, hash(g.damage, hash(:GameWeapon, h)))
end

anon60034542 · May 28, 2021, 5:59pm

Is writing it the way AutoHashEquals.jl does any better than what rdeits suggested? I’m not criticizing his answer, I just want a better understanding of which method is better.

rdeits · May 28, 2021, 6:00pm

Hashing the type too (the way AutoHashEquals does) is probably a good idea, since it will prevent hash collisions between two different structs that happen to have the same fields. On reflection, I’d probably recommend what @fredrikekre wrote over what I posted above.

anon60034542 · May 28, 2021, 6:03pm

Out of curiosity, is there anything wrong with the way I did the hash()? The part of recursive may have confused me.

I see the problem, just as you mentioned.

@enum WeaponType CloseRange Reloadable

struct GameWeapon
    damage::Int32
    type::WeaponType
end

struct Weapon
    damage::Int32
    type::WeaponType
end

standard_sword = GameWeapon(25,CloseRange)
println(hash(standard_sword))

standard_sword = Weapon(25,CloseRange)
println(hash(standard_sword))

Both of these produce the same hash code. Given that they’re different struct’s they should produce different hash codes, correct?

fredrikekre · May 28, 2021, 6:19pm

Not necessarily, it just decreases the risk of hash collisions. Even though they have the same hash you can use both as separate keys in Dict for example

julia> g1 = GameWeapon(25,CloseRange);

julia> g2 = Weapon(25,CloseRange);

julia> hash(g1)
0x4f2c71de34f6c410

julia> hash(g2)
0x4f2c71de34f6c410

julia> d = Dict();

julia> d[g1] = "g1";

julia> d[g2]
ERROR: KeyError: key Weapon(25, CloseRange) not found

julia> d[g2] = "g2";

julia> d
Dict{Any, Any} with 2 entries:
  GameWeapon(25, CloseRange) => "g1"
  Weapon(25, CloseRange)     => "g2"

There is also for example

julia> hash(1)
0x5bca7c69b794f8ce

julia> hash(1.0)
0x5bca7c69b794f8ce

although this is an Int and a Float64. However, in this case you can’t have them as two keys in a Dict which is a bit weird, not sure what is up with that.

julia> d = Dict();

julia> d[1] = "Int"
"Int"

julia> d[1.0]  # !!!
"Int"

anon60034542 · May 28, 2021, 6:27pm

Ah! Thanks for the explanation.

My problem here is that I was confused about this part:

New types should implement the 2-argument form, typically by calling the 2-argument hash method
recursively in order to mix hashes of the contents with each other (and with h ).

I don’t mind using AutoHashEquals.jl, but I would like to understand for myself how to properly implement Base.hash() according to the documentation.

The part I misunderstood was the recursive.

It seems to produce a unique hash, AutoHashEquals.jl combines the struct name as a symbol along with the required fields, correct?

fredrikekre · May 28, 2021, 6:44pm

It would be great to add an example to make this clear then, do you want to contribute with that? Here is the docstring.

Yes, but there is still no guarante it will be a unique value, e.g.

module A
struct Foo
    x::Int
end
end

module B
struct Foo
    x::Int
end
end

would give the same hash for A.Foo(1) and B.Foo(1). I don’t think it matters that much though.

anon60034542 · May 28, 2021, 7:32pm

It’s funny you mention that, because I suggested it a few months ago .

I’m still learning so I don’t think that I’d be able to produce an accurate example. Perhaps someone with deep knowledge of Julia might update the documentation.

Topic		Replies	Views
Properly implement Base.hash() for custom type follow up General Usage	11	1096	June 1, 2021
Would it be possible to update the documentation to include how to implement isequal() and hash() General Usage	14	980	January 19, 2021
Weird hash behavior General Usage question	4	674	May 22, 2017
Alternatives to `Base.hash`? General Usage	14	950	April 11, 2022
Fast hash for custom type General Usage question , performance , hash	10	882	September 17, 2021

Properly implement Base.hash() for custom type

Related topics