NearestNeighbors.jl is the first package I created with the initial commit now being (slightly over) 10 years ago.
In fact, doing a nearest neighbors query was the very first code I wrote in Julia (with the Julia LightTable plugin) when I was a teachers assistant in a Python course where
we used k-d trees for an assignment and I wanted to experiment a bit with other languages.
The package has been mostly in maintenance mode for a while, but I recently added some new features that I would like to share. For more details you can look in the README of the package.
Periodic boundary conditions
NearestNeighbors can now do searches assuming that points are periodically repeating within some bounding box. This is done by wrapping a tree and specifying the bounding-box extents:
using NearestNeighbors
kdtree = KDTree(data)
p_kdtree = PeriodicTree(kdtree, [0.0, 0.0], [1.0, 1.0])
The API is then identical to any other tree in NearestNeighbors.
Self queries
It is now possible to do both range and knn self-searches across all points in a tree with inrange_pairs and allknn:
julia> d = rand(3,10); tree = KDTree(d);
julia> range = 0.3
julia> inrange_pairs(tree, range) # all pairs (i, j) within a certain range
3-element Vector{Tuple{Int64, Int64}}:
(1, 6)
(3, 7)
(6, 8)
julia> k = 2
julia> idxs, dists = allknn(tree, k); # k neighbors to each point in tree
julia> idxs
10-element Vector{Vector{Int64}}:
[8, 6]
[6, 5]
[9, 7]
[9, 5]
[4, 2]
...
julia> dists
10-element Vector{Vector{Float64}}:
[0.16387521055240933, 0.0380628491898866]
[0.2848492080997025, 0.20752470954341026]
[0.15886092320279743, 0.08144735280130108]
[0.6523525478275332, 0.2623835541815712]
[0.2623835541815712, 0.20752470954341026]
...
Parallel tree building
Trees are now built using multiple threads (when Julia is started with multiple threads available).
As an example:
julia> @btime KDTree(rand(3, 10^6));
# 1 thread
197.647 ms (36 allocations: 84.34 MiB)
# 10 threads:
38.420 ms (25031 allocations: 86.86 MiB)
julia> @btime BallTree(rand(3, 10^6))
# 1 thread
240.852 ms (33 allocations: 86.41 MiB)
# 10 threads:
48.513 ms (30027 allocations: 88.92 MiB)
Tree traversal
There is now an official API for traversing trees for some advanced use cases.
It extends AbstractTrees.jl so it works with the iterators defined in that package, but we also provide custom iterators because the AbstractTrees ones were order(s) of magnitude slower in some cases.
The two notebooks:
are examples of using this tree traversal API to illustrate the tree structures.
Unit support
The package now works with unitful data:
using Unitful, NearestNeighbors, StaticArrays
using Unitful.DefaultSymbols
tree = NearestNeighbors.KDTree([SVector(1m,2m,3m)]);
NearestNeighbors.nn(tree,[1m,2m,3m])
# (1, 0.0 m)
Performance improvements
Various tweaks and changes have been made to improve performance.
Below are links to PRs for anyone interested in the details:
Keep track of max distance from a point to a hyper rectangle, which can allow stopping traversal and adding all points:
Fewer square roots used for BallTree:
- optimize some hypersphere checks for minkowski metrics by KristofferC ยท Pull Request #195 ยท KristofferC/NearestNeighbors.jl ยท GitHub
- avoid some square rooting to check distances by KristofferC ยท Pull Request #197 ยท KristofferC/NearestNeighbors.jl ยท GitHub
Bumped default leaf size:
nn is now non-allocating:
Improved type stability in outputs when tree is created with matrix data (and the dimensions of points are not statically known):
Thanks for reading and please open issues / PRs if you have issues or requests for improvements.
