Help me understand the `unique` semantic for a vector of mutable struct

The unique documentation mentions that it is based on the isequal function.
Hence I cannot understand the result of the following MWE:

Base.@kwdef mutable struct AStruct
    toto::String    =  "toto"
    tuti::Int       = 2
end      

function Base.:(==)(a::AStruct, b::AStruct)
    for field ∈ fieldnames(AStruct)
        getproperty(a, field) != getproperty(b, field) && return false
    end
    return true
end

function unique_experiment()
    a = [(AStruct("titi",3),"aa"),(AStruct("tutu",3),"aa"),(AStruct("titi",3),"aa"),(AStruct("tete",3),"bb")]
    println("lenght(a) = $(length(a))")
    @show a[1]
    @show a[3]
    @show a[1] == a[3]
    ua = unique(a)
    println("lenght(ua) = $(length(ua))")
end

The output is:

lenght(a) = 4
a[1] = (PathViz.AStruct("titi", 3), "aa")
a[3] = (PathViz.AStruct("titi", 3), "aa")
a[1] == a[3] = true
lenght(ua) = 4
1 Like

The == implementation for AStruct and default isequal did not fulfill this documented requirement: “isequal is the comparison function used by hash tables (Dict ). isequal(x,y) must imply that hash(x) == hash(y) .” As a consequence, unique is unable to put them in the same slot of a Set.

Behavior you’d like:

julia> isequal([1], [1])
true

julia> unique([[1], [1]])
1-element Vector{Vector{Int64}}:
 [1]

julia> hash.([[1], [1]])
2-element Vector{UInt64}:
 0xc7253bdde85cb9f5
 0xc7253bdde85cb9f5

What happens now:

julia> isequal((AStruct("titi",3),"aa"), (AStruct("titi",3),"aa"))
true

julia> unique([(AStruct("titi",3),"aa"), (AStruct("titi",3),"aa")])
2-element Vector{Tuple{AStruct, String}}:
 (AStruct("titi", 3), "aa")
 (AStruct("titi", 3), "aa")

julia> hash.([(AStruct("titi",3),"aa"), (AStruct("titi",3),"aa")])
2-element Vector{UInt64}:
 0xd5f0f1ab22ccc31c
 0xbec08e442b331e69

Same discrepancy occurs for the AStruct instances alone rather than Tuples containing them.

1 Like

Thank you very much for this detailed explanation !

I wonder if some tooling could help developers by forcing them to implement a hash method for each custom isequal method (I have spent some energy trying to figure out the problem before this discourse post).

Thank you again !

The docstrings do say that custom isequal/== and hash implementations come in pairs. I wish I knew how hashing works to give any suggestions.

You are right, but one has to guess that he may not understand what is the semantic of isequal in order to look at the corresponding docstring.

The docstring of unique does not make any reference to the hash function:

help?> unique
search: unique unique! allunique

  unique(itr)

  Return an array containing only the unique elements of collection itr, as
  determined by isequal, in the order that the first of each set of equivalent
  elements originally appears. The element type of the input is preserved.

  See also: unique!, allunique, allequal.

Right, I meant the ==/isequal/hash docstrings. It does seem strange to me that unique mentions isequal in particular when hash table types must use hash then isequal to do key equality properly.

1 Like

Note that I think that I would have I consider the hash function if I was dealing with Dict. But since I was only using regular Arrays, I have (wrongly) assumed that unique implementation only implied a bunch of call to isequal.

Maybe the unique docstring could be improved, mentioning the hash function.

1 Like