In Nearest Neighbours are trees other than BruteTree worth it for sets with between n=2^3 and n=2^20ish for KNN in 2D cartesian space

AlanAylmer · March 27, 2022, 5:27pm

I know the colossal @kristoffer.carlsson has been busy recently but I wonder if anyone else knows the A to my Q…

I am running a teaching group on grouping waveforms. The waveform side of it is okay but I want to understand trees better so I can lay out types of wave on a 2D graph and then (try to) group them using KNN. there could be quite a lot of waves: between 8 and whatever 2^20 or so is.

I got a bit stumped by understanding how trees make KNN more efficient. The NearestNeighbors package seems to be aimed at high n for n-dimensional distributions and I think I will be keeping things 2D.

There are only two types of waves btw. Say… “good” and “bad” sounds. I want demonstrate how KKN predicts good or bad.

The entropy of the 2d distribution could be really random, or very segregated. Depends.

If I only use BruteTree in my high n low dimensional model am I missing some huge optimisation?
Or should I chill about that?

kristoffer.carlsson · March 28, 2022, 3:58pm

Yes, KDTrees are good in low dimensions and get (significantly) worse with higher dimensions. Using a proper tree should be a big win. But just benchmarking should show the truth for your specific use case.

AlanAylmer · March 28, 2022, 6:15pm

Hi Kris!

Big wins are kind that I like most.

Keep up the good work.

Topic		Replies	Views
How to compute neighbour network using NearestNeighbors.jl Specific Domains	4	1515	January 23, 2017
KDTree in NearestNeighbors.jl with Rectangles as leaf nodes Specific Domains	1	446	March 26, 2019
[ANN] VPTrees.jl - nearest neighbor search in metric spaces Package Announcements	20	2689	January 30, 2020
MLJ KNNRegressor BruteTree broken for predict-- load error brutetree has no field indices...? Machine Learning mlj	4	420	January 22, 2022
Suggestion needed: Tree like structure for adaptive binning of multidim data Statistics	6	575	October 2, 2020

In Nearest Neighbours are trees other than BruteTree worth it for sets with between n=2^3 and n=2^20ish for KNN in 2D cartesian space

Related topics