Vector databases

Are there any vector database implementations in Julia?

Looking for a data structure that stores a collection of vectors of real numbers and maps a query vector to an ordered list of the n closest vectors to the query (or all vectors within a certain distance). Distance = cosine similarity. All vectors have the same length.

1 Like

Are you looking for something like NearestNeighbors.jl? You can get persistence saving it to disk with JLD2.jl or other generic format to store Julia objects.

1 Like

not exactly vector database but there’s discussion about GitHub - domluna/tinyrag in generative-ai on Slack

1 Like

NearestNeighbors.jl answers the question I asked, but domluna’s approach is much faster than NearestNeighbors.BruteTree and the other NearestNeighbors trees segfault on construction so…

Hand rolled linear search it is! (and the constant factor for a naive version of domluna’s approach is already about 0.05ns*vector size*number of vectors, which is outstanding!)

There was an instrinsic I wasn’t using so it’s even faster now. On my macbook air m1 it takes ~1ms for a search over 1M vector using StaticArrays.

I might look into a HNSW implementation. I think if it’s focused on just this binary vector comparison use case we might be able to get something that’s just a few microseconds, or possible even less if we’re doing a logarithmic number of comparison and each comparison is <20ns (on my machine).

1 Like