Findnearest function?

Is there a simple findnearest() function somewhere in Julia-land?

A = [1, 5, 12, 23];
findnearest(A, 14)  
# ->  3
findnearest(A, 18)
# ->  4 
3 Likes

I’ll take a shot at it:

findnearest(A::AbstractArray,t) = findmin(abs(A-t))[2]

Remove the [2] to return nearest value as well. Not sure how to accommodate ties though?

3 Likes

If the array is sorted, like in the post, then the following will return a range with the nearest values:

function findnearest(a,x)
       n = length(a)
       n > 0 || return 0:-1
       i1 = searchsortedlast(a,x)
       i0 = i1
       if i1>0
           while i0>1 && a[i0-1]==a[i0]
               i0 -= 1
           end
           d = x-a[i1]
       else
           i0 = i1+1
           d = a[i1+1]-x
       end
       i2 = i1
       if i2<n && a[i2+1]-x<d
           i0 = i2+1
           d = a[i2+1]-x
           i2 += 1
       end
       while i2<n && a[i2+1]-x==d
           i2 += 1
       end
       return i0:i2
end

Note the details, such as returning a zero-length range when array is empty, and returning a range with values on both sides of x when they are of equal distance.

The OP’s example reproduces as:

julia> A = [1, 5, 12, 23];
julia> findnearest(A, 14)
3:3
julia> # ->  3
       findnearest(A, 18)
4:4
julia> # ->  4
1 Like

Just to clean things a bit, the last version works, but can take linear time on some inputs (such as a constant long vector).
The following is cleaner and uses just the sorted search functions which are sublinear.

function findnearest(a,x)
       length(a) > 0 || return 0:-1
       r = searchsorted(a,x)
       length(r) > 0 && return r
       last(r) < 1 && return searchsorted(a,a[first(r)])
       first(r) > length(a) && return searchsorted(a,a[last(r)])
       x-a[last(r)] < a[first(r)]-x && return searchsorted(a,a[last(r)])
       x-a[last(r)] > a[first(r)]-x && return searchsorted(a,a[first(r)])
       return first(searchsorted(a,a[last(r)])):last(searchsorted(a,a[first(r)]))
end

Again, we have:

julia> A = [1, 5, 12, 23];
julia> findnearest(A, 14)
3:3
julia> findnearest(A, 18)
4:4
1 Like

Like @cormullion, I’ve had need for such a function many times, and here is my personal solution for it. Unlike @Dan’s version, this doesn’t care about ties and returns just a single index. Also, as it works on sorted arrays, the function name should be searchsortednearest to be consistent with Julia’s function names (I would expect findnearest to work on any array).

function searchsortednearest(a,x)
   idx = searchsortedfirst(a,x)
   if (idx==1); return idx; end
   if (idx>length(a)); return length(a); end
   if (a[idx]==x); return idx; end
   if (abs(a[idx]-x) < abs(a[idx-1]-x))
      return idx
   else
      return idx-1
   end
end
5 Likes

Neat! By the way, currently Julia 0.6 wants me to:

WARNING: abs{T <: Number}(x::AbstractArray{T}) is deprecated, use abs.(x) instead.

Some good solutions - the simpler offering (@traktofon) benchmarks a bit faster, but both are fine for my needs.

(How about adding a version to Julia, since it appears to be useful? :))

Since it assumes a sorted container, the speed advantage is not surprising.

AFAIK the core developers want to make Julia smaller, and put such functionality in packages. So perhaps you could find a package to submit this to.

@traktofon Thanks for your solution!

I’ve copied it enough times that I’m putting it into a package (and adding the keyword args supported by searchsortedfirst/searchsortedlast). If anyone has improvements, please let me know!

https://github.com/joshday/SearchSortedNearest.jl

6 Likes

@joshday, as you have studied this topic, could you please highlight the advantage(s) of the the proposed function compared to the fast and more general solution (similar to one given above):

findnearest(A,x) = argmin(abs.(A .- x))

Thank you.

1 Like

Fast is relative, and since you’re essentially competing with searchsortedfirst in Base, you probably won’t be faster.

  1. If you try some benchmarks, I’d expect your findnearest to have a lot more allocations.

  2. Since you’re not assuming the collection is sorted, you must visit every element.

2 Likes

Thanks for the feedback.
Wondering what results you get for the following example where the input array is sorted in reverse order?

A = [23, 12, 5, 1]
x = 14
using BenchmarkTools
@btime findnearest($A, $x)  # 25 ns (1 allocation: 96 bytes)

Would you be willing to open a PR to add this function to base? Especially in light of how complementary it is to the existing searchsorted functions, I shouldn’t expect much/any objection to its addition.

searchsortednearest obviously won’t work for an unsorted array. These are two very different functions with different use-cases. If you’re interested in optimizing the general findnearest, you can take a look at this workbook: https://github.com/JuliaComputing/Training/blob/master/Parallel/020%20Serial%20Performance.jl

2 Likes

I think it will take some battle-testing/more thought before adding to Base.

The difference from the other searchsorted functions is that they require only that the type is ordered.

searchsortednearest also requires that a difference/distance exists. There probably needs to be a distance keyword that falls back to -.


At a more abstract level, what we’re doing is finding the argmin of a convex function over a finite domain. Using upcoming 1.7 syntax, we’re doing

julia> A = 1:100; x=5.1;

julia> argmin(ai -> abs(ai - x), A) 
5

But because we know the function is convex we want to stop searching through A once the function stops decreasing.

All that to say I’m not sure whether this functionality should belong with the searchsorted family or if there should be a way to tell argmin my function is convex (or tell argmax my function is concave) and my collection is sorted.

EDIT: I’ve mistaken the behavior of argmin(f, collection) (returns value) for argmin(collection) (returns index).

1 Like

Matt, the array in the example is sorted in reverse order and in the package linked by Josh, the syntax seems to allow this?

searchsortednearest(a, x; by=<transform>, lt=<comparison>, rev=false)

Oh that’s great — then you can just test it yourself. For what it’s worth, I’d expect the simple linear search to be faster than a sorted search for very tiny arrays like this, especially when optimized to remove allocations as is done in those training materials.

1 Like

This returns an element of A and not its index - unlike other solutions.

1 Like

Oh, you’re right! Hopefully my line of thought was clear enough.

1 Like

about

joshday/ SearchSortedNearest.jl

  • Is this a subset of ANN((approximate nearest neighbor) or KNN or similarity search ??

  • Can a / x be extended to higher dimensions, not only a number ??

@joshday

1 Like