How to make a Set of real values based on rtol?

Lets say you wanted to make a Set of real numbers:

Set( [ 
    1.0,
    1+eps(),
    2
] )

Where for all practical purposes, this should reduce to Set([1.0,2.0])

Is there an existing data structure that allows something like: Set(..., rtol=1e-8) to provide this functionality?

(as well as maintain all the features of a Set)


edit: I guess what I’m asking is:

  • is it possible to change the equality condition for Set checking?

Set is based on hashing, and there won’t be a hash function for your notion of equality, because it’s not transitive.

julia> a,b,c = [1.0 .+ i * 5e7 * eps() for i in 0:2];

julia> isapprox(a,b)
true

julia> isapprox(b,c)
true

julia> isapprox(a,c)
false

Because you want to do this for real numbers, which are ordered, you should be able to keep a sorted list of members. That way, it will take log(n) time to insert an element or decide it’s already inserted, instead of n time. You should be able to use searchsorted for this.

4 Likes

Or for the simple,

function filter_approx!(cur_vector; atol=5e-2)
  delete_indices = []

  for (cur_index, cur_value) in enumerate(cur_vector)
    is_duplicate = any(
      tmp_value -> isapprox(tmp_value, cur_value, atol=atol),
      cur_vector[1:cur_index-1]
    )

    is_duplicate || continue
    push!(delete_indices, cur_index)
  end

  deleteat!(cur_vector, delete_indices)

  cur_vector
end

With,

tmp_vector = [1.0 .+ i * 5e7 * eps() for i in 0:2]
filter_approx!(tmp_vector)

println(tmp_vector)

>> [1.0]


Further convenience methods,

function approx_push!(cur_vector, cur_value; atol=5e-2)
  is_duplicate = any(
    tmp_value -> isapprox(tmp_value, cur_value, atol=atol),
    cur_vector
  )

  is_duplicate && return cur_vector
  push!(cur_vector, cur_value)  
end
function approx_append!(cur_vector, other_vector; atol=5e-2)
  append!(cur_vector, other_vector)  
  filter_approx!(cur_vector; atol=atol)

  cur_vector
end