Properly implement Base.hash() for custom type follow up

Before my previous post gets bigger and confusing, I decided to try and implement my own custom hash() method based on the documentation and what was suggested in my previous post.

I’m aware of AutoHashEquals.jl, but I would like a basic understanding of hashing before using it.

From the documentation:

New types should implement the 2-argument form, typically by calling the 2-argument hash method recursively in order to mix hashes of the contents with each other (and with h ).

From my understanding of the documentation here is what I came up with:

@enum WeaponType CloseRange Reloadable

struct GameWeapon
    damage::Int32
    type::WeaponType
end

function ==(x::GameWeapon, y::GameWeapon)
    return (x.damage === y.damage) &&
           (x.type === y.type)
end

function hash(weapon::GameWeapon, h::UInt=0x12345467)
    hash_of_type = hash(weapon.type,h)
    hash_of_damage = hash(weapon.damage,h)
    return hash((hash_of_type,hash_of_damage),h)
end

I call the 2-argument hash() method recursively to obtain the hashes for each field, create a tuple and hash that with h or as the documentation put it:

mix hashes of the contents with each other (and with h )

main.jl:

standard_sword = GameWeapon(25,CloseRange)
println(hash(standard_sword)) # prints 272133118
  1. Is their anything wrong with this approach? Have I misinterpreted the documentation?

The nice part about providing the h argument is that if i had another struct with the same field it won’t produce the same hash.

struct Weapon
    damage::Int32
    type::WeaponType
end

function hash(weapon::Weapon, h::UInt=0x7654321)
    hash_of_type = hash(weapon.type,h)
    hash_of_damage = hash(weapon.damage,h)
    return hash((hash_of_type,hash_of_damage),h)
end

main.jl:

standard_sword = Weapon(25,CloseRange)
println(hash(standard_sword)) # prints 1618513225

Of course none of this means I’ve written an adequate hash() method, it could all be wrong.

This maybe better in another post, but it would mean re-posting the same code again, so I’ll ask it here.

I noticed some strange behaviour with the hash() method and the h argument and I was wondering if anyone can explain.

Providing the h:

function hash(weapon::GameWeapon, h::UInt=0x12345467)
    println("Provided Unsigned Int: $(h)")
end

function hash(weapon::Weapon, h::UInt=0x7654321)
    println("Provided Unsigned Int: $(h)")
end

produces a different result in the println() because when I call hash(standard_sword) it generates its own unique unsigned integer and then calls the custom 2-argument hash(), overwriting the default h I provided, correct?

Provided Unsigned Int: 305419367
Provided Unsigned Int: 124076833

If I were to remove the value for the h argument and write the methods like this:

function hash(weapon::GameWeapon, h::UInt)
    println("Provided Hash: $(h)")
    hash_of_type = hash(weapon.type,h)
    hash_of_damage = hash(weapon.damage,h)
    return hash((hash_of_type,hash_of_damage),h)
end

function hash(weapon::Weapon, h::UInt)
    println("Provided Hash: $(h)")
    hash_of_type = hash(weapon.type,h)
    hash_of_damage = hash(weapon.damage,h)
    return hash((hash_of_type,hash_of_damage),h)
end

h is 0 and both methods produce the same hash value.

  1. Why is it that when I provide the h value the following happens:
  • A unique unsigned integer is generated and used to overwrite the default h value provided.

  • As a result of the unique unsigned integer, a unique hash is generated, ensuring that two structs that have the same fields won’t produce the same hash.

No, it doesn’t produce a different result at all. You are simply providing the unsigned integer as a haxadecimal (that’s what 0x does), which is then printed in decimal:

julia> println(0x12345467)
305419367

julia> string(0x12345467, base=16)
"12345467"

julia> string(0x12345467, base=10)
"305419367"

Not providing a default h is the correct way. To make different structs have different hashes, hash the type itself as well, e.g.

function hash(weapon::GameWeapon, h::UInt)
    h = hash(weapon.type, h)
    h = hash(weapon.damage, h)
    return hash(GameWeapon, h) # hash the type itself to avoid hash collisions with `Weapon`
end

Note that this overwrites h when hashing each field, which is what the “recursively” in the documentation refers to.

2 Likes

Does it say that anywhere in the documentation? I’m not saying you’re wrong, I just want clarification why not providing the h value is correct.

If you write a function as

function hash(weapon::GameWeapon, h::UInt=0x12345467)
    ...
end

you are effectively writing two methods, one with two arguments and another with one argument. The documentation is advising that you only define the two argument version, meaning you shouldn’t provide a default for the second argument.

2 Likes

Bare with me here cause I’m still learning :smiley:

When you call the hash() without the default h value (the 2 argument method), how does the h get assigned?

Right here:

2 Likes

It’s probably best to let that handle producing the h value, correct?

Is hashing everything and placing it inside a tuple and hashing that a good approach (as shown my first example)? I can always add hashing the type to the tuple to produce a unique hash.

It is mentioned in the hash docstring:

New types should implement the 2-argument form

The problem with providing your own default h is that it may be ignored by other hash methods. For example, hashing an array will hash some of its elements. When you put your GameWeapon in an array and hash the array, your default h will not be used for that (the two-argument hash(::GameWeapon, h::UInt) will be called with a different h), which makes the distinction between GameWeapon and Weapon go away in that case.

2 Likes

Ah! Thanks for the explanation.

Is my use of the tuple to collect all the hashes okay? It’s how I interpreted (or misinterpreted) the recursive part of the documentation.

Taking what you said what hashing the type, I could restructure my hash() method like this:

function hash(weapon::GameWeapon, h::UInt)
    hash_of_type = hash(weapon.type,h)
    hash_of_damage = hash(weapon.damage,h)
    hash_of_struct = hash(GameWeapon,h)
    return hash((hash_of_type,hash_of_damage,hash_of_struct),h)
end

Is there anything wrong with this method?

Only that it probably does more work than necessary. It would be cheaper to do:

function hash(weapon::GameWeapon, h::UInt)
    h = hash(weapon.type,h)
    h = hash(weapon.damage,h)
    return hash(GameWeapon,h)
end
8 Likes

Besides being less efficient than the recursive version, the variable hash_of_type is used here for both weapon.type and GameWeapon.

1 Like

Ah! foolish mistake, let me correct it.

function hash(weapon::GameWeapon, h::UInt)
    hash_of_type = hash(weapon.type,h)
    hash_of_damage = hash(weapon.damage,h)
    hash_of_struct = hash(GameWeapon,h)
    return hash((hash_of_type,hash_of_damage,hash_of_struct),h)
end

Okay, now it’s just less efficient.