In Nearest Neighbours are trees other than BruteTree worth it for sets with between n=2^3 and n=2^20ish for KNN in 2D cartesian space

I know the colossal @kristoffer.carlsson has been busy recently but I wonder if anyone else knows the A to my Q…

I am running a teaching group on grouping waveforms. The waveform side of it is okay but I want to understand trees better so I can lay out types of wave on a 2D graph and then (try to) group them using KNN. there could be quite a lot of waves: between 8 and whatever 2^20 or so is.

I got a bit stumped by understanding how trees make KNN more efficient. The NearestNeighbors package seems to be aimed at high n for n-dimensional distributions and I think I will be keeping things 2D.

There are only two types of waves btw. Say… “good” and “bad” sounds. I want demonstrate how KKN predicts good or bad.

The entropy of the 2d distribution could be really random, or very segregated. Depends.

If I only use BruteTree in my high n low dimensional model am I missing some huge optimisation?
Or should I chill about that?

Yes, KDTrees are good in low dimensions and get (significantly) worse with higher dimensions. Using a proper tree should be a big win. But just benchmarking should show the truth for your specific use case.

4 Likes

Hi Kris!

Big wins are kind that I like most.

Keep up the good work.