Create a set with custom hash and isequal

Is it possible to create a Set{MyType} that uses a custom hash and isequal functions s.t. it is essentially a set of equivalence classes w.r.t the equivalence relation defined by my hash and isequal? I want to do this without loosing my standard isequal that checks for full equality, and not only equality w.r.t to my equivalence relation.

See below for the desired behaviour:

struct MyType
    coeff::Float64
    indices::Vector{Int}
end

Base.hash(x::MyType, h::UInt) = hash(x.indices, h)
Base.isequal(x::MyType, y::MyType) = (x.coeff == y.coeff) && (x.indices == y.indices)

a = MyType(1., [2, 3])
b = MyType(2., [2, 3])

a == b
# False, this is desired

s = Set([a, b])
# Set{MyType} with 2 elements:
#   MyType(1.0, [2, 3])
#   MyType(2.0, [2, 3])

# But I want this to be
# Set{MyType} with 1 element:
#   MyType(1.0, [2, 3])
#
# or 
#
# Set{MyType} with 1 element:
#   MyType(2.0, [2, 3])

I’m not aware of a set type that does this. However, I think this would be a useful feature. For example, I might have a fast hash function for some custom type V{T} <: AbstractVector{T}, but it is not compatible with the default hash for vectors. This incompatibility does not matter for a Set with element type V{T} (or a Dict with this key type). Hence the custom hash function would speed things up in such a scenario.

I can think of a solution:

struct MyType
    coeff::Float64
    indices::Vector{Int}
end

Base.hash(x::MyType, h::UInt) = hash(x.coeff, hash(x.indices, h))
Base.isequal(x::MyType, y::MyType) = (x.coeff == y.coeff) && (x.indices == y.indices)

a = MyType(1., [2, 3])
b = MyType(2., [2, 3])

# False
@info a == b 
# Set(MyType[MyType(1.0, [2, 3]), MyType(2.0, [2, 3])])
@info Set([a, b]) 

struct MyWrapper
    x::MyType
end

Base.hash(u::MyWrapper, h::UInt) = hash(u.x.indices, h)
Base.isequal(u::MyWrapper, v::MyWrapper) = isequal(u.x.indices, v.x.indices)

# Set(MyWrapper[MyWrapper(MyType(2.0, [2, 3]))])
@info Set([MyWrapper(a), MyWrapper(b)]) 

It’s verbose, I would say. But it’s decently clean: Two MyTypes are equivalent if coeffs and indicess match; two MyWrappers are equivalent if indicess of the underlying MyTypes match.

1 Like

In the end I settled to represent my Set{MyType} as a wrapped Dict{Vector{Int}, Float64}. Some (un)wrapping was required to turn insertion of a MyType into insertion of a Vector{Int} => Float64 into that dict and make iteration yield MyType instead of Pair{Vector{Int}, Float64}, but I hope that the compiler is able to pretty much compile all these (un)wrapping operations away.

Great minds think alike :smiley:

Your solutoin works as well and is similar to my solution above in that additional (un)wrapping would be needed to be able to directly insert MyTypes and have iteration yield MyTypes instead of MyWrapper.

:grinning_face:

Using Dict is way more convenient. You can avoid wrapping/unwrapping too:

function store_mytype!(dict::AbstractDict, x::MyType)
    get!(dict, x.indices) do
        x
    end
end

dict = Dict{Vector{Int}, MyType}()
store_mytype!(dict, a)
store_mytype!(dict, b)

# Or use: store_mytype!.(Ref(dict), [a, b])

# MyType[MyType(1.0, [2, 3])]
@info values(dict)

See:

Sure, but then you have to unwrap the elements again when you want to use methods defined for the original type – if that it possible at all. Maybe the elements are used inside a function defined in some third-party package. It would be nice if everything could be done under the hood.