Help me understand the `unique` semantic for a vector of mutable struct

The unique documentation mentions that it is based on the isequal function.
Hence I cannot understand the result of the following MWE:

Base.@kwdef mutable struct AStruct
    toto::String    =  "toto"
    tuti::Int       = 2

function Base.:(==)(a::AStruct, b::AStruct)
    for field ∈ fieldnames(AStruct)
        getproperty(a, field) != getproperty(b, field) && return false
    return true

function unique_experiment()
    a = [(AStruct("titi",3),"aa"),(AStruct("tutu",3),"aa"),(AStruct("titi",3),"aa"),(AStruct("tete",3),"bb")]
    println("lenght(a) = $(length(a))")
    @show a[1]
    @show a[3]
    @show a[1] == a[3]
    ua = unique(a)
    println("lenght(ua) = $(length(ua))")

The output is:

lenght(a) = 4
a[1] = (PathViz.AStruct("titi", 3), "aa")
a[3] = (PathViz.AStruct("titi", 3), "aa")
a[1] == a[3] = true
lenght(ua) = 4
1 Like

The == implementation for AStruct and default isequal did not fulfill this documented requirement: “isequal is the comparison function used by hash tables (Dict ). isequal(x,y) must imply that hash(x) == hash(y) .” As a consequence, unique is unable to put them in the same slot of a Set.

Behavior you’d like:

julia> isequal([1], [1])

julia> unique([[1], [1]])
1-element Vector{Vector{Int64}}:

julia> hash.([[1], [1]])
2-element Vector{UInt64}:

What happens now:

julia> isequal((AStruct("titi",3),"aa"), (AStruct("titi",3),"aa"))

julia> unique([(AStruct("titi",3),"aa"), (AStruct("titi",3),"aa")])
2-element Vector{Tuple{AStruct, String}}:
 (AStruct("titi", 3), "aa")
 (AStruct("titi", 3), "aa")

julia> hash.([(AStruct("titi",3),"aa"), (AStruct("titi",3),"aa")])
2-element Vector{UInt64}:

Same discrepancy occurs for the AStruct instances alone rather than Tuples containing them.

1 Like

Thank you very much for this detailed explanation !

I wonder if some tooling could help developers by forcing them to implement a hash method for each custom isequal method (I have spent some energy trying to figure out the problem before this discourse post).

Thank you again !

The docstrings do say that custom isequal/== and hash implementations come in pairs. I wish I knew how hashing works to give any suggestions.

You are right, but one has to guess that he may not understand what is the semantic of isequal in order to look at the corresponding docstring.

The docstring of unique does not make any reference to the hash function:

help?> unique
search: unique unique! allunique


  Return an array containing only the unique elements of collection itr, as
  determined by isequal, in the order that the first of each set of equivalent
  elements originally appears. The element type of the input is preserved.

  See also: unique!, allunique, allequal.

Right, I meant the ==/isequal/hash docstrings. It does seem strange to me that unique mentions isequal in particular when hash table types must use hash then isequal to do key equality properly.

1 Like

Note that I think that I would have I consider the hash function if I was dealing with Dict. But since I was only using regular Arrays, I have (wrongly) assumed that unique implementation only implied a bunch of call to isequal.

Maybe the unique docstring could be improved, mentioning the hash function.

1 Like