Are there any vector database implementations in Julia?
Looking for a data structure that stores a collection of vectors of real numbers and maps a query vector to an ordered list of the n closest vectors to the query (or all vectors within a certain distance). Distance = cosine similarity. All vectors have the same length.
Are you looking for something like NearestNeighbors.jl? You can get persistence saving it to disk with JLD2.jl or other generic format to store Julia objects.
NearestNeighbors.jl answers the question I asked, but domluna’s approach is much faster than NearestNeighbors.BruteTree and the other NearestNeighbors trees segfault on construction so…
Hand rolled linear search it is! (and the constant factor for a naive version of domluna’s approach is already about 0.05ns*vector size*number of vectors, which is outstanding!)
There was an instrinsic I wasn’t using so it’s even faster now. On my macbook air m1 it takes ~1ms for a search over 1M vector using StaticArrays.
I might look into a HNSW implementation. I think if it’s focused on just this binary vector comparison use case we might be able to get something that’s just a few microseconds, or possible even less if we’re doing a logarithmic number of comparison and each comparison is <20ns (on my machine).