Use of hash code in integer hashing

I’m curious about the design of the integer hashing functions. Someone in my company coming from Python was annoyed to discover that the second argument to hash doesn’t produce very different results for incremented values of h, and there’s at least one package he was trying to use (bloom filters in Probably.jl) that assumes they would be.

julia> hash(0x000000000796a326, UInt64(0))

julia> hash(0x000000000796a326, UInt64(1))

julia> hash(0x000000000796a326, UInt64(2))

julia> hash(0x000000000796a326, UInt64(3))
# from hashing.jl:
hash(x::Int64,  h::UInt) = hash_uint64(bitcast(UInt64, x)) - 3h
hash(x::UInt64, h::UInt) = hash_uint64(x) - 3h

I’d like to be able to explain this design decision to him, and perhaps expand the documentation to warn about using the hash function that way.

1 Like

On pondering, I’m thinking the intent is that h is supposed to be the output of a previous hash function call? (hence, calling it a hash code)

Yes, it’s for mixing multiple hashes together.

julia> hash(0x000000000796a326, hash(0))

julia> hash(0x000000000796a326, hash(1))

It kind of seems like the - 3h could go inside the call to hash_uint64 Instead of outside. You want to make sure the function is asymmetrical in the two arguments but the factor of -3 ensures that already. The only down side I can see is that it could make it easier to craft an input that interacts badly with given hashes. Would be good to look at the history of this definition.