Is there a simple findnearest()
function somewhere in Julia-land?
A = [1, 5, 12, 23];
findnearest(A, 14)
# -> 3
findnearest(A, 18)
# -> 4
Is there a simple findnearest()
function somewhere in Julia-land?
A = [1, 5, 12, 23];
findnearest(A, 14)
# -> 3
findnearest(A, 18)
# -> 4
I’ll take a shot at it:
findnearest(A::AbstractArray,t) = findmin(abs(A-t))[2]
Remove the [2]
to return nearest value as well. Not sure how to accommodate ties though?
If the array is sorted, like in the post, then the following will return a range with the nearest values:
function findnearest(a,x)
n = length(a)
n > 0 || return 0:-1
i1 = searchsortedlast(a,x)
i0 = i1
if i1>0
while i0>1 && a[i0-1]==a[i0]
i0 -= 1
end
d = x-a[i1]
else
i0 = i1+1
d = a[i1+1]-x
end
i2 = i1
if i2<n && a[i2+1]-x<d
i0 = i2+1
d = a[i2+1]-x
i2 += 1
end
while i2<n && a[i2+1]-x==d
i2 += 1
end
return i0:i2
end
Note the details, such as returning a zero-length range when array is empty, and returning a range with values on both sides of x when they are of equal distance.
The OP’s example reproduces as:
julia> A = [1, 5, 12, 23];
julia> findnearest(A, 14)
3:3
julia> # -> 3
findnearest(A, 18)
4:4
julia> # -> 4
Just to clean things a bit, the last version works, but can take linear time on some inputs (such as a constant long vector).
The following is cleaner and uses just the sorted
search functions which are sublinear.
function findnearest(a,x)
length(a) > 0 || return 0:-1
r = searchsorted(a,x)
length(r) > 0 && return r
last(r) < 1 && return searchsorted(a,a[first(r)])
first(r) > length(a) && return searchsorted(a,a[last(r)])
x-a[last(r)] < a[first(r)]-x && return searchsorted(a,a[last(r)])
x-a[last(r)] > a[first(r)]-x && return searchsorted(a,a[first(r)])
return first(searchsorted(a,a[last(r)])):last(searchsorted(a,a[first(r)]))
end
Again, we have:
julia> A = [1, 5, 12, 23];
julia> findnearest(A, 14)
3:3
julia> findnearest(A, 18)
4:4
Like @cormullion, I’ve had need for such a function many times, and here is my personal solution for it. Unlike @Dan’s version, this doesn’t care about ties and returns just a single index. Also, as it works on sorted arrays, the function name should be searchsortednearest
to be consistent with Julia’s function names (I would expect findnearest
to work on any array).
function searchsortednearest(a,x)
idx = searchsortedfirst(a,x)
if (idx==1); return idx; end
if (idx>length(a)); return length(a); end
if (a[idx]==x); return idx; end
if (abs(a[idx]-x) < abs(a[idx-1]-x))
return idx
else
return idx-1
end
end
Neat! By the way, currently Julia 0.6 wants me to:
WARNING: abs{T <: Number}(x::AbstractArray{T}) is deprecated, use abs.(x) instead.
Some good solutions - the simpler offering (@traktofon) benchmarks a bit faster, but both are fine for my needs.
(How about adding a version to Julia, since it appears to be useful? :))
Since it assumes a sorted container, the speed advantage is not surprising.
AFAIK the core developers want to make Julia smaller, and put such functionality in packages. So perhaps you could find a package to submit this to.
@traktofon Thanks for your solution!
I’ve copied it enough times that I’m putting it into a package (and adding the keyword args supported by searchsortedfirst
/searchsortedlast
). If anyone has improvements, please let me know!
@joshday, as you have studied this topic, could you please highlight the advantage(s) of the the proposed function compared to the fast and more general solution (similar to one given above):
findnearest(A,x) = argmin(abs.(A .- x))
Thank you.
Fast is relative, and since you’re essentially competing with searchsortedfirst
in Base, you probably won’t be faster.
If you try some benchmarks, I’d expect your findnearest
to have a lot more allocations.
Since you’re not assuming the collection is sorted, you must visit every element.
Thanks for the feedback.
Wondering what results you get for the following example where the input array is sorted in reverse order?
A = [23, 12, 5, 1]
x = 14
using BenchmarkTools
@btime findnearest($A, $x) # 25 ns (1 allocation: 96 bytes)
Would you be willing to open a PR to add this function to base? Especially in light of how complementary it is to the existing searchsorted
functions, I shouldn’t expect much/any objection to its addition.
searchsortednearest
obviously won’t work for an unsorted array. These are two very different functions with different use-cases. If you’re interested in optimizing the general findnearest
, you can take a look at this workbook: https://github.com/JuliaComputing/Training/blob/master/Parallel/020%20Serial%20Performance.jl
I think it will take some battle-testing/more thought before adding to Base.
The difference from the other searchsorted
functions is that they require only that the type is ordered.
searchsortednearest
also requires that a difference/distance exists. There probably needs to be a distance
keyword that falls back to -
.
At a more abstract level, what we’re doing is finding the argmin
of a convex function over a finite domain. Using upcoming 1.7 syntax, we’re doing
julia> A = 1:100; x=5.1;
julia> argmin(ai -> abs(ai - x), A)
5
But because we know the function is convex we want to stop searching through A
once the function stops decreasing.
All that to say I’m not sure whether this functionality should belong with the searchsorted
family or if there should be a way to tell argmin
my function is convex (or tell argmax
my function is concave) and my collection is sorted.
EDIT: I’ve mistaken the behavior of argmin(f, collection)
(returns value) for argmin(collection)
(returns index).
Matt, the array in the example is sorted in reverse order and in the package linked by Josh, the syntax seems to allow this?
searchsortednearest(a, x; by=<transform>, lt=<comparison>, rev=false)
Oh that’s great — then you can just test it yourself. For what it’s worth, I’d expect the simple linear search to be faster than a sorted search for very tiny arrays like this, especially when optimized to remove allocations as is done in those training materials.
This returns an element of A
and not its index - unlike other solutions.
Oh, you’re right! Hopefully my line of thought was clear enough.
about
Is this a subset of ANN((approximate nearest neighbor) or KNN or similarity search ??
Can a / x be extended to higher dimensions, not only a number ??