Fast hash for custom type

We have a custom type and we are not satisfied with the Base.hash performance. See here for the real world example.
What are the best practice for defining a custom hash? This is what we did:


julia> struct S{T}; value::T; end

julia> s = S(1)
S{Int64}(1)

julia> seed = UInt(1)
0x0000000000000001

julia> using BenchmarkTools; @btime hash($s, $seed)
  6.990 ns (0 allocations: 0 bytes)
0xbee890e79d476a32

julia> objectid(S)
0x3ae5cb9a575935b6

julia> function myhash(s, seed)
           salt = 0x3ae5cb9a575935b6
           hash(s.value, hash(salt, seed))
       end
myhash (generic function with 1 method)

julia> @btime myhash($s, $seed)
  0.020 ns (0 allocations: 0 bytes)
0x85f4c910a0ba7fed

Is this a sane approach? Any problems with this?

from the docs

hash(x[, h::UInt]) → UInt

Compute an integer hash code such that isequal(x,y) implies hash(x)==hash(y).
The optional second argument h is a hash code to be mixed with the result.

New types should implement the 2-argument form, typically by calling the
2-argument hash method recursively in order to mix hashes of the contents with
each other (and with h).

e.g. (using these constants)

const lens_seed = UInt === UInt64 ? 0x8aa710fd8cb95e7e : 0x4fdb9639

function Base.hash(x::Setfield.IndexLens{I}, h::UInt) where {I<:Tuple}
   h += lens_seed
   return hash(x.indices, h)
end
1 Like

That time looks like constant-folded compiler shenanigans to me. Try with

@btime myhash($(Ref(s))[], $(Ref(seed))[]) 

instead.

1 Like

The same goes for the benchmarks in the linked issue imo.

That time looks like constant-folded compiler shenanigans to me.

Right! (though I was happy to see, that in contrast to the default hash, the compiler can constant fold this one.)

julia> @btime myhash($(Ref(s))[], $(Ref(seed))[])
  1.793 ns (0 allocations: 0 bytes)

What’s the timing for regular hash on your system? With the same Ref trick of course.


julia> using BenchmarkTools; @btime hash($s, $seed)
  6.990 ns (0 allocations: 0 bytes)
julia> using BenchmarkTools; @btime hash($(Ref(s))[], $(Ref(seed))[])
7.301 ns (0 allocations: 0 bytes)

hash calles objectid, which does a ccall. It seems this ccall is the bottle neck.

@JeffreySarnoff How did you come up with the values for the constant in the 32bit case and why
h += lens_seed and not h = hash(lens_seed, h) ?

const lens_seed = UInt === UInt64 ? 0x8aa710fd8cb95e7e : 0x4fdb9639

function Base.hash(x::IndexLens{I}, h::UInt) where {I<:Tuple}
   h += lens_seed
   return hash(x.indices, h)
end

The 64-bit constant comes from you.

julia> s64 = 0x8aa710fd8cb95e7e
0x8aa710fd8cb95e7e

julia> s32 = UInt32(s64 >> 32) ⊻ UInt32(s64 & 0x00000000ffffffff)
0x061e4e83  # Palli corrected this (see next in thread)

They can be whatever you prefer, although they should not be known to clash with another type’s similar constants.

h += lens_seed is faster than h = hash(lens_seed, h) and all you want to do before calling the inner hash(x.indicies, h) is to morph the incoming h in some way that is consistent for IndexLens. I copied the += approach from Base/hashing.jl, for hashing a String and from Base/tuples.jl hashing.

2 Likes

Was this supposed to be:

julia> s32 = UInt32(s64 >> 32) ⊻ UInt32(s64 & 0x00000000ffffffff)
0x061e4e83

In Julia ^ is always exponentiation, as I think you may know, not xor like in some other languages. I’m just making sure, maybe any mixing works. [You can also use the xor function or Unicode operator type \xor tab.]

1 Like

Yesforgetting to forget C