Nearest Neighbor Search for points consisting of vectors/arrays

FrootLoops · May 6, 2020, 3:04pm

Hello everybody,

I am trying to use the nearest neighbor package in Julia. I read the documentary and it was very easy to understand the examples for one dimensional problems. In my case it is a little bit more complicated.

I have a given point p = (x,y) where x and y are both 3 dimensional vectors. My data is a set consisting of points p_i = (x_i, y_i) where x_i and y_i are also 3 dimensional vectors. Now I want to find the nearest point p_i to p such that

min (|| x - x_i ||^2 + || y - y_i ||^2)

So far I tried a little bit, but I am always getting different kind of error messages. In the case below I used arrays instead of “real” vectors. But I am open to any suggestions for improvement.

# Create random data point p_i = (x_i, y_i) with x_i, y_i vectors
data = Array{Array{Float64, 1}}[]
for k in 1:10
    x_i = rand(Float64, (3))
    y_i = rand(Float64, (3))
    push!(data, [x_i, y_i])
end

# Create random point p = (x, y) with x,y vectors
point = Array{Array{Float64, 1}}[]
x = [0.1, 0.2, 0.3]
y = [0.5, 0.6, 0.7]
push!(point, [x, y])

# Find nearest point p_i in data to point p 
k = 3
kdtree = KDTree(data)
idxs, dists = knn(kdtree, point, k, true)

FrootLoops · May 6, 2020, 8:21pm

Okay I found a workaround. My idea is to create a new data set consisting of vectors with entries x_i and y_i such that each data element is a vector with 6 entries.

In theory it would be something like:

data = Array{Float64, 1}[]
# Create random data point p_i = (x_i, y_i) with x_i, y_i vectors
for k in 1:10
    x_i = rand(Float64, (3))
    y_i = rand(Float64, (3))
    new_point = append!(x_i, y_i)
    push!(data, new_point)
end

# point p = (x, y) with x,y vectors
x = [0.1, 0.2, 0.3]
y = [0.5, 0.6, 0.7]
point = append!(x, y)

stevengj · May 6, 2020, 8:31pm

Exactly. However, you should use StaticArrays.jl for this kind of thing (arrays of fixed-length short arrays) — it will be dramatically faster.

That is, your 3-component points should be x = SVector(0.1, 0.2, 0.3) etcetera, and your 6-component points should be SVector(x..., y...) (6-component SVectors). And your point array then becomes a Vector{SVector{6,Float64}} (a 1d Array of 6-component SVector).

You should see big performance boosts — because the length of an SVector is known at compile time, many optimizations suddenly become possible.

FrootLoops · May 6, 2020, 8:35pm

Thank you stevengj. I mean in this case it is working with the idea of append both vectors. My other question is now, how would you solve this problem, if you have for two different norms i.e.measuring the distance of the x_i to x by a different norm as y_i to y.

stevengj · May 6, 2020, 9:46pm

You could define a custom type MyType that stores both x and y, and overload +, -, and norm for MyType:

MyType{T<:Number}
    x::SVector{3,T}
    y::SVector{3,T}
end
MyType(x,y) = MyType{promote_type(eltype(x),eltype(y))}(x, y)
Base.:+(a::MyType, b::MyType) = MyType(a.x+b.x, a.y+b.y)
Base.:-(a::MyType, b::MyType) = MyType(a.x-b.x, a.y-b.y)
LinearAlgebra.norm(a::MyType) = ....some norm....

(You could define other functions too, e.g. zero for the additive identity, * for scalars, etcetera, but you may not need any more methods for KDTree.)

Topic		Replies	Views
How to compute neighbour network using NearestNeighbors.jl Specific Domains	4	1515	January 23, 2017
Closest distance between two Euclidian grids General Usage question	19	1276	June 5, 2021
Vector databases General Usage vector-search	5	566	June 20, 2024
Creating graphs from data Statistics	7	1767	August 7, 2019
[ANN] VPTrees.jl - nearest neighbor search in metric spaces Package Announcements	20	2689	January 30, 2020

Nearest Neighbor Search for points consisting of vectors/arrays

Related topics