Defining a custom hash function

I have a type BezierPath that contains a vector of commands. I want two bezier paths with the same commands stored in different vectors to have the same hash value, by default they don’t. Is it fine to just overload hash to operate on the vector content? I don’t see how that could have negative consequences other than that it’s not possible anymore to use two different instances as separate keys in a dict if they have the same commands. But that’s kind of the point.

Is that true? I would think they can be separate keys because equality should still be checked for lookup.

julia> mutable struct S end

julia> a = S(); b = S(); a === b
false

julia> Set([a,b])
Set{S} with 2 elements:
  S()
  S()

julia> Base.hash(::S)::UInt = 1

julia> Set([a,b])
Set{S} with 2 elements:
  S()
  S()

You can overload hash for your own types however you like. You must also overload ==. See the docstring for details.

Don’t overload hash for Vector though, that would be type piracy and would break stuff.

You need to ensure hash and isequal are compatible, e.g. isequal(x, y) always implies hash(x) == hash(y). Otherwise the behavior for Sets and Dicts is not well defined. In your case, you are actually creating a hash collision, which is generally fine (although your particular hash function would of course completely defeat the purpose of a hash map in the first place), but since they are not isequal, two separate entries will still be created.

More precisely, hash only needs to be compatible with isequal. isequal falls back to == by default, but == is afforded some more liberties like NaN != NaN for floating point variables.

1 Like

The OP question is whether it is safe to coarsen the hash. Coarsening is always safe but may reduce performance; refining is the risky operation.

1 Like

Yes sorry, I made that a little more clear in my answer.

1 Like

Yes I want to avoid that two bezier paths with the same commands are stored as two different entries in a dict, so I thought I’d make the hash dependent on command vector content, not command vector identity. Otherwise the dict could fill up with essentially the same path every time a plot is made.

IIUC, the issue is that the vector of command is stored in a BezierPath struct, but checking isequal on BezierPath by default will call === on the struct fields.

If you want a BezierPath type where both isequal and hash check for “content equality” rather than “vector identity”, you can use https://github.com/andrewcooke/AutoHashEquals.jl. It should be as simple as adding @auto_hash_equals in front of the struct definition.