I have a type BezierPath
that contains a vector of commands. I want two bezier paths with the same commands stored in different vectors to have the same hash value, by default they don’t. Is it fine to just overload hash
to operate on the vector content? I don’t see how that could have negative consequences other than that it’s not possible anymore to use two different instances as separate keys in a dict if they have the same commands. But that’s kind of the point.
Is that true? I would think they can be separate keys because equality should still be checked for lookup.
julia> mutable struct S end
julia> a = S(); b = S(); a === b
false
julia> Set([a,b])
Set{S} with 2 elements:
S()
S()
julia> Base.hash(::S)::UInt = 1
julia> Set([a,b])
Set{S} with 2 elements:
S()
S()
You can overload hash
for your own types however you like. You must also overload ==
. See the docstring for details.
Don’t overload hash
for Vector
though, that would be type piracy and would break stuff.
You need to ensure hash
and isequal
are compatible, e.g. isequal(x, y)
always implies hash(x) == hash(y)
. Otherwise the behavior for Set
s and Dict
s is not well defined. In your case, you are actually creating a hash collision, which is generally fine (although your particular hash
function would of course completely defeat the purpose of a hash map in the first place), but since they are not isequal
, two separate entries will still be created.
More precisely, hash
only needs to be compatible with isequal
. isequal
falls back to ==
by default, but ==
is afforded some more liberties like NaN != NaN
for floating point variables.
The OP question is whether it is safe to coarsen the hash. Coarsening is always safe but may reduce performance; refining is the risky operation.
Yes sorry, I made that a little more clear in my answer.
Yes I want to avoid that two bezier paths with the same commands are stored as two different entries in a dict, so I thought I’d make the hash dependent on command vector content, not command vector identity. Otherwise the dict could fill up with essentially the same path every time a plot is made.
IIUC, the issue is that the vector of command is stored in a BezierPath
struct, but checking isequal
on BezierPath
by default will call ===
on the struct fields.
If you want a BezierPath
type where both isequal
and hash
check for “content equality” rather than “vector identity”, you can use https://github.com/andrewcooke/AutoHashEquals.jl. It should be as simple as adding @auto_hash_equals
in front of the struct definition.