# Help me understand the `unique` semantic for a vector of mutable struct

The `unique` documentation mentions that it is based on the `isequal` function.
Hence I cannot understand the result of the following MWE:

``````Base.@kwdef mutable struct AStruct
toto::String    =  "toto"
tuti::Int       = 2
end

function Base.:(==)(a::AStruct, b::AStruct)
for field ∈ fieldnames(AStruct)
getproperty(a, field) != getproperty(b, field) && return false
end
return true
end

function unique_experiment()
a = [(AStruct("titi",3),"aa"),(AStruct("tutu",3),"aa"),(AStruct("titi",3),"aa"),(AStruct("tete",3),"bb")]
println("lenght(a) = \$(length(a))")
@show a[1]
@show a[3]
@show a[1] == a[3]
ua = unique(a)
println("lenght(ua) = \$(length(ua))")
end
``````

The output is:

``````lenght(a) = 4
a[1] = (PathViz.AStruct("titi", 3), "aa")
a[3] = (PathViz.AStruct("titi", 3), "aa")
a[1] == a[3] = true
lenght(ua) = 4
``````
1 Like

The `==` implementation for `AStruct` and default `isequal` did not fulfill this documented requirement: “`isequal` is the comparison function used by hash tables (`Dict` ). `isequal(x,y)` must imply that `hash(x) == hash(y)` .” As a consequence, `unique` is unable to put them in the same slot of a `Set`.

Behavior you’d like:

``````julia> isequal([1], [1])
true

julia> unique([[1], [1]])
1-element Vector{Vector{Int64}}:
[1]

julia> hash.([[1], [1]])
2-element Vector{UInt64}:
0xc7253bdde85cb9f5
0xc7253bdde85cb9f5
``````

What happens now:

``````julia> isequal((AStruct("titi",3),"aa"), (AStruct("titi",3),"aa"))
true

julia> unique([(AStruct("titi",3),"aa"), (AStruct("titi",3),"aa")])
2-element Vector{Tuple{AStruct, String}}:
(AStruct("titi", 3), "aa")
(AStruct("titi", 3), "aa")

julia> hash.([(AStruct("titi",3),"aa"), (AStruct("titi",3),"aa")])
2-element Vector{UInt64}:
0xd5f0f1ab22ccc31c
0xbec08e442b331e69
``````

Same discrepancy occurs for the `AStruct` instances alone rather than Tuples containing them.

1 Like

Thank you very much for this detailed explanation !

I wonder if some tooling could help developers by forcing them to implement a `hash` method for each custom `isequal` method (I have spent some energy trying to figure out the problem before this discourse post).

Thank you again !

The docstrings do say that custom `isequal`/`==` and `hash` implementations come in pairs. I wish I knew how hashing works to give any suggestions.

You are right, but one has to guess that he may not understand what is the semantic of `isequal` in order to look at the corresponding docstring.

The docstring of `unique` does not make any reference to the `hash` function:

``````help?> unique
search: unique unique! allunique

unique(itr)

Return an array containing only the unique elements of collection itr, as
determined by isequal, in the order that the first of each set of equivalent
elements originally appears. The element type of the input is preserved.

Right, I meant the `==`/`isequal`/`hash` docstrings. It does seem strange to me that `unique` mentions `isequal` in particular when hash table types must use `hash` then `isequal` to do key equality properly.
Note that I think that I would have I consider the `hash` function if I was dealing with `Dict`. But since I was only using regular `Array`s, I have (wrongly) assumed that `unique` implementation only implied a bunch of call to `isequal`.
Maybe the `unique` docstring could be improved, mentioning the `hash` function.