Nearest Neighbor Search for points consisting of vectors/arrays

Hello everybody,

I am trying to use the nearest neighbor package in Julia. I read the documentary and it was very easy to understand the examples for one dimensional problems. In my case it is a little bit more complicated.

I have a given point p = (x,y) where x and y are both 3 dimensional vectors. My data is a set consisting of points p_i = (x_i, y_i) where x_i and y_i are also 3 dimensional vectors. Now I want to find the nearest point p_i to p such that

min (|| x - x_i ||^2 + || y - y_i ||^2)

So far I tried a little bit, but I am always getting different kind of error messages. In the case below I used arrays instead of β€œreal” vectors. But I am open to any suggestions for improvement.

# Create random data point p_i = (x_i, y_i) with x_i, y_i vectors
data = Array{Array{Float64, 1}}[]
for k in 1:10
    x_i = rand(Float64, (3))
    y_i = rand(Float64, (3))
    push!(data, [x_i, y_i])
end

# Create random point p = (x, y) with x,y vectors
point = Array{Array{Float64, 1}}[]
x = [0.1, 0.2, 0.3]
y = [0.5, 0.6, 0.7]
push!(point, [x, y])

# Find nearest point p_i in data to point p 
k = 3
kdtree = KDTree(data)
idxs, dists = knn(kdtree, point, k, true)

Okay I found a workaround. My idea is to create a new data set consisting of vectors with entries x_i and y_i such that each data element is a vector with 6 entries.

In theory it would be something like:

data = Array{Float64, 1}[]
# Create random data point p_i = (x_i, y_i) with x_i, y_i vectors
for k in 1:10
    x_i = rand(Float64, (3))
    y_i = rand(Float64, (3))
    new_point = append!(x_i, y_i)
    push!(data, new_point)
end

# point p = (x, y) with x,y vectors
x = [0.1, 0.2, 0.3]
y = [0.5, 0.6, 0.7]
point = append!(x, y)


Exactly. However, you should use StaticArrays.jl for this kind of thing (arrays of fixed-length short arrays) β€” it will be dramatically faster.

That is, your 3-component points should be x = SVector(0.1, 0.2, 0.3) etcetera, and your 6-component points should be SVector(x..., y...) (6-component SVectors). And your point array then becomes a Vector{SVector{6,Float64}} (a 1d Array of 6-component SVector).

You should see big performance boosts β€” because the length of an SVector is known at compile time, many optimizations suddenly become possible.

1 Like

Thank you stevengj. I mean in this case it is working with the idea of append both vectors. My other question is now, how would you solve this problem, if you have for two different norms i.e.measuring the distance of the x_i to x by a different norm as y_i to y.

You could define a custom type MyType that stores both x and y, and overload +, -, and norm for MyType:

MyType{T<:Number}
    x::SVector{3,T}
    y::SVector{3,T}
end
MyType(x,y) = MyType{promote_type(eltype(x),eltype(y))}(x, y)
Base.:+(a::MyType, b::MyType) = MyType(a.x+b.x, a.y+b.y)
Base.:-(a::MyType, b::MyType) = MyType(a.x-b.x, a.y-b.y)
LinearAlgebra.norm(a::MyType) = ....some norm....

(You could define other functions too, e.g. zero for the additive identity, * for scalars, etcetera, but you may not need any more methods for KDTree.)

1 Like