Set with type with custom equality

I have a type whose instance is equivalent to instances that are in some distance away:

struct VectorWithEpsRadius{T}
    value::T
    eps::Float64
end

function ==(a::VectorWithEpsRadius, b::VectorWithEpsRadius)
    return norm(a.value - b.value) <= a.eps
end

I would like to use this inside a Set. However the behavior is not what I expect:

f(x) = VectorWithEpsRadius{Int64}(x, 10)
Base.hash(m::VectorWithEpsRadius{T}, h::UInt) where T = Base.hash(m.value, h)
println(f(10) == f(11)) # true
println(hash(f(10)) == hash(f(11))) #  false

and

julia> s = Set{ChaosTools.VectorWithEpsRadius{Int64}}()
Set{ChaosTools.VectorWithEpsRadius{Int64}}()

julia> push!(s, f(21))
Set{ChaosTools.VectorWithEpsRadius{Int64}} with 1 element:
  ChaosTools.VectorWithEpsRadius{Int64}(21, 10.0)

julia> push!(s, f(22))
Set{ChaosTools.VectorWithEpsRadius{Int64}} with 2 elements:
  ChaosTools.VectorWithEpsRadius{Int64}(22, 10.0)
  ChaosTools.VectorWithEpsRadius{Int64}(21, 10.0)

I want the set to contain only the first element and not the second. I want those two elements to be considered equal.

How to solve this?

Thanks

This is not an equivalence relation because it is not transitive, so that may cause problems in code that expects == to behave normally. (See also the discussion in `unique()` with `isapprox()` instead of `isequal()` · Issue #19147 · JuliaLang/julia · GitHub)

The hash documentation says that it should obey:

Compute an integer hash code such that isequal(x,y) implies hash(x)==hash(y).

Your hash and == (called by isequal) definitions are inconsistent with this requirement, since two values that compare == can have different hashes. So, you should expect any algorithm that uses hashes (sets, dictionaries, …) to break for your type.

Because your == is not an equivalence relation, it’s not clear what a “set” even means. For example, suppose you specify eps=0.15 and take the numbers

[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0]

Each one of these numbers is “==” to the next number with eps=0.15, so does a "set’ contain only one element? But because your == is not transitive, most of these numbers are not “==” to one another, so should the “set” contain many elements? Which ones?

Basically, you need to decide more carefully what you want to do here. For example, one “solution” would be to round all of your values to a certain number of digits, and then use the rounded values for comparisons and hashing. This would be an equivalence relation, but may or may not be what you want.

2 Likes

I don’t intend to form a set out of an array. I only want limited functionality of the set and I understand that other things could break down. Basically I need a datastructure with fast insertion and lookup (no ordering needed). My hopes were to insert element into an empty set and then the second insertion would only be allowed if the second element is in the correct distance.

The datastructure behavior that I want currently works with a RBTree{VectorWithEpsRadius} however I was facing some bugs with == which I can’t easily reproduce here. That’s why I am looking for an alternative.

Rounding numbers is not an option, this is exactly what I want to avoid.

My solution is as follows. It’s not optimal but benchmarks better than my previous approach with binary tree.

function mypush!(set, x, abs)
    previously_detected(set, x, abs) ? nothing : push!(set, x)
end

function previously_detected(set, x, abs)
    for i in set
        if norm(x - i) < abs
            return true
        end
    end
    return false
end

I insert into the set with mypush! only if I am sure that I want the element there. I need to do the linear search which is not the best. However insertion into the set doesn’t allocate and should happen in constant time.