I was surprised to see that it is not enough to implement isequal
to get unique
for free, though the documentation seems to suggest it:
https://docs.julialang.org/en/v1/base/collections/#Base.unique
Return an array containing only the unique elements of collection itr
, as determined by isequal
The problem seems to be that unique
implicitly assumes that struct1 == struct2
implies hash(struct1) == hash(struct2)
.
I noticed this behaviour for a struct with two fields where the order does not matter for comparison. Here an MWE:
struct UndirectedEdge
f1
f2
end
import Base.==
==(x::UndirectedEdge, y::UndirectedEdge) = (x.f1 == y.f1 && x.f2 == y.f2) || (x.f2 == y.f1 && x.f1 == y.f2)
e1 = UndirectedEdge(:a, :b)
e2 = UndirectedEdge(:b, :a) # equal to e1
e3 = UndirectedEdge(:c, :a)
unique([e1, e2, e3])
e1 == e2
isequal(e1, e2) # isequal should be enough for unique
It works, when I additionally define a hash method:
import Base.hash
hash(x::UndirectedEdge, h::UInt) = hash(sort([x.f1, x.f2]), h)
unique([e1, e2, e3])
It works for me if I define Base.isequal
instead of Base.==
.
EDIT: It even works when I define Base.==
as you did (no Base.hash
method necessary).
This is documented in the docs for isequal
:
isequal is the comparison function used by hash tables (Dict). isequal(x,y) must imply that hash(x)
== hash(y)
, as well as in the docs of hash
. But yes, this invariant is a little to easy to forget, for which there is an old issue: custom hashing is too easy to accidentally break · Issue #12198 · JuliaLang/julia · GitHub
2 Likes
Performance aside, can I define isequal
in terms of hashing?
import Base.==
import Base.hash
hash(x::UndirectedEdge, h::UInt) = hash(sort([x.f1, x.f2]), h)
==(x::UndirectedEdge, y::UndirectedEdge) = hash(x) == hash(y)
I think so! But then you might get nasty surprises if you find a hash collision.
Thank you so much. It was not clear to me that I have to implement hash
and isequal
in tandem even if I do not care too much about hashing.
Sukera
March 16, 2022, 5:02pm
7
If you don’t implement them in tandem, things like putting your custom struct into a Dict
may break, as it uses a hash to determine which bucket to place the object. It’s not necessarily related to whether you care about hashing itself, but what you want to do with it.
1 Like