Defining a custom hash function

jules · December 20, 2021, 7:24pm

I have a type BezierPath that contains a vector of commands. I want two bezier paths with the same commands stored in different vectors to have the same hash value, by default they don’t. Is it fine to just overload hash to operate on the vector content? I don’t see how that could have negative consequences other than that it’s not possible anymore to use two different instances as separate keys in a dict if they have the same commands. But that’s kind of the point.

jzr · December 20, 2021, 8:08pm

Is that true? I would think they can be separate keys because equality should still be checked for lookup.

julia> mutable struct S end

julia> a = S(); b = S(); a === b
false

julia> Set([a,b])
Set{S} with 2 elements:
  S()
  S()

julia> Base.hash(::S)::UInt = 1

julia> Set([a,b])
Set{S} with 2 elements:
  S()
  S()

cjdoris · December 20, 2021, 8:29pm

You can overload hash for your own types however you like. You must also overload ==. See the docstring for details.

Don’t overload hash for Vector though, that would be type piracy and would break stuff.

simeonschaub · December 20, 2021, 8:58pm

You need to ensure hash and isequal are compatible, e.g. isequal(x, y) always implies hash(x) == hash(y). Otherwise the behavior for Sets and Dicts is not well defined. In your case, you are actually creating a hash collision, which is generally fine (although your particular hash function would of course completely defeat the purpose of a hash map in the first place), but since they are not isequal, two separate entries will still be created.

More precisely, hash only needs to be compatible with isequal. isequal falls back to == by default, but == is afforded some more liberties like NaN != NaN for floating point variables.

jzr · December 20, 2021, 9:08pm

The OP question is whether it is safe to coarsen the hash. Coarsening is always safe but may reduce performance; refining is the risky operation.

simeonschaub · December 20, 2021, 9:10pm

Yes sorry, I made that a little more clear in my answer.

jules · December 20, 2021, 9:12pm

Yes I want to avoid that two bezier paths with the same commands are stored as two different entries in a dict, so I thought I’d make the hash dependent on command vector content, not command vector identity. Otherwise the dict could fill up with essentially the same path every time a plot is made.

piever · December 20, 2021, 9:40pm

IIUC, the issue is that the vector of command is stored in a BezierPath struct, but checking isequal on BezierPath by default will call === on the struct fields.

If you want a BezierPath type where both isequal and hash check for “content equality” rather than “vector identity”, you can use https://github.com/andrewcooke/AutoHashEquals.jl. It should be as simple as adding @auto_hash_equals in front of the struct definition.

Topic		Replies	Views
Hash of Dict with custom type as keys General Usage dictionary	4	1365	September 20, 2021
Should custom hash functions distinguish types? General Usage dictionary , hash	2	202	December 1, 2023
Set inconsistencies with structs General Usage question	4	481	May 6, 2021
Create a set with custom hash and isequal General Usage set , hash	7	94	April 9, 2025
Hashing of different types with same contents General Usage	2	289	February 8, 2021

Defining a custom hash function

Related topics