GPU based Nearest Neighbor (brute force)

Hi,

Is there a maintained Julia package that can perform high-dimensional nearest-neighbor queries on the GPU, e.g., 1000s of queries against 10s of thousands of database vectors?

I’m aware of the wrapper for FAISS, but I wonder if there’s another Julia-native library (e.g., using Kernel Abstractions).

Regards.

1 Like

The place for nn searches is JuliaNeighbors · GitHub . I do not think any of the libraries has GPU support however it can probably be added in a straightforward manner.

In the Distances package, the function pairwise calculates all pairwise distances. The result can be reduced to the nearest neighbours.
Since the calculation for appropriate metrics uses matrix products, they are GPU accelerated. Not sure this gains much compared to a brute-force double for loop.

Perhaps it’s possible to run NearestNeighbors.jl using Reactant.jl? That would make it trivial to run on GPU.

You should clarify your setting a little more.

What do you want? Nearest neighbor, K nearest neighbor, approximate KNN, ???

Euclidean distance or something else?

“High-dimensional” is a meaningless term. Give us the number of zeros of the dimension.

By dimension, I assume you mean extrinsic dimension, e.g. “oh I have 5000 long Float32 vectors”.

There is also an intrinsic dimension, e.g. “…but the data tends lies on an effectively 4.7 dimensional data manifold” (which is not a manifold; datasets tend to have fractal and scale-dependent dimension, like e.g. the space of sequences with sum(xn^2/n^2 for (n,xn) in enumerate(x)) < 1 with respect to the euclidean norm).

Intrinsic dimension tends to determine the performance of non-brute-force approaches. But the intrinsic dimension may not be apparent.

You have not told us whether you have single queries or batched queries, and whether you care about latency or throughput or power consumption. (If you have single queries and care about throughput, then batch them!)

You have not told us why you want brute-force and why you want GPU.

All that being said… ~1e3 queries against ~1e4 points is tiny. You should not need a GPU for that, this is a job for a toaster. And if you use GPU, you should not need a fancy library, this is one matmul.

cos similarity or Euclidean distance, 128 to 512 dimensional Float 32 embeddings batched. query, roughly matching 10,000 against 10,000; 2-nearest neigbors.

1 Like