I was surprised to see that it is not enough to implement isequal to get unique for free, though the documentation seems to suggest it:
https://docs.julialang.org/en/v1/base/collections/#Base.unique
Return an array containing only the unique elements of collection itr , as determined by isequal
The problem seems to be that unique implicitly assumes that struct1 == struct2 implies hash(struct1) == hash(struct2).
I noticed this behaviour for a struct with two fields where the order does not matter for comparison. Here an MWE:
struct UndirectedEdge
f1
f2
end
import Base.==
==(x::UndirectedEdge, y::UndirectedEdge) = (x.f1 == y.f1 && x.f2 == y.f2) || (x.f2 == y.f1 && x.f1 == y.f2)
e1 = UndirectedEdge(:a, :b)
e2 = UndirectedEdge(:b, :a) # equal to e1
e3 = UndirectedEdge(:c, :a)
unique([e1, e2, e3])
e1 == e2
isequal(e1, e2) # isequal should be enough for unique
It works, when I additionally define a hash method:
import Base.hash
hash(x::UndirectedEdge, h::UInt) = hash(sort([x.f1, x.f2]), h)
unique([e1, e2, e3])
It works for me if I define Base.isequal instead of Base.==.
EDIT: It even works when I define Base.== as you did (no Base.hash method necessary).
This is documented in the docs for isequal:
isequal is the comparison function used by hash tables (Dict). isequal(x,y) must imply that hash(x)
== hash(y)
, as well as in the docs of hash. But yes, this invariant is a little to easy to forget, for which there is an old issue: custom hashing is too easy to accidentally break · Issue #12198 · JuliaLang/julia · GitHub
2 Likes
Performance aside, can I define isequal in terms of hashing?
import Base.==
import Base.hash
hash(x::UndirectedEdge, h::UInt) = hash(sort([x.f1, x.f2]), h)
==(x::UndirectedEdge, y::UndirectedEdge) = hash(x) == hash(y)
I think so! But then you might get nasty surprises if you find a hash collision.
Thank you so much. It was not clear to me that I have to implement hash and isequal in tandem even if I do not care too much about hashing.
Sukera
March 16, 2022, 5:02pm
7
If you don’t implement them in tandem, things like putting your custom struct into a Dict may break, as it uses a hash to determine which bucket to place the object. It’s not necessarily related to whether you care about hashing itself, but what you want to do with it.
1 Like