Hi all I would like to share an initial release of LanceDB.jl, a Julia wrapper for LanceDB a high-performance, embedded vector database designed in rust for AI and machine learning workflows.
What is LanceDB?
LanceDB is an open-source, serverless vector database built on top of the Lance columnar format. It is designed for storing, indexing, and querying high-dimensional vector embeddings alongside structured metadata making it a natural fit for retrieval-augmented generation (RAG) pipelines, semantic search, and other embedding-heavy ML workflows. Unlike many vector databases, LanceDB runs embedded (no separate server process), which makes it extremely convenient for local development and research.
While playing with excellent Julia library PromptingTools.jl I was stuggling with a way to integrate with a vector database while staying in pure Julia. I dabbled with Qdrant and then stumbled on LanceDB, itβs very similar to DuckDB in a sense that itβs in-process and uses Arrow structure for on disk format. The design seems very well thought out and the performance is very good.
Fortunately they have C bindings published for the database too, and that allowed me to use DuckDB.jl as a reference and with the help from Qwen/CoPilot I was able to create the wrapper. The build is not very clean and requires manual steps as I was not able to find lancedb in Yggdrasil. It works well in practice, but I recognize that the manual build step is a friction point for adoption as well as automating with CI/CD pipeline.
Help Required If anyone has experience building binary packages for Yggdrasil, I would genuinely appreciate help creating a lancedb_c_jll builder recipe. This would allow the library to be distributed as a proper JLL package and remove the manual build requirement entirely, which is really the last big hurdle before this can be registered. Please feel free to open an issue or reach out directly! Also I am not sure what are the policies around adding packages in general registry that are built with the help of LLMs. Since there is not much original work here as itβs just wrapping an existing C/Rust library it may not be a big issue.
Usage
using LanceDB
using Tables
using DataFrames
# 1. Connect (embedded β no server needed)
conn = Connection("/tmp/movie_db")
# 2. Create a table from any Tables.jl-compatible source
# Here we use a plain NamedTuple; a DataFrame works identically.
movies = (
id = Int32[1, 2, 3, 4, 5],
title = ["Dune", "Arrival", "Interstellar", "Ex Machina", "Annihilation"],
year = Int32[2021, 2016, 2014, 2014, 2018],
rating = Float32[7.9, 7.9, 8.6, 7.7, 6.8],
embedding = [
Float32[0.9, 0.1, 0.2, 0.0],
Float32[0.1, 0.8, 0.3, 0.2],
Float32[0.8, 0.2, 0.5, 0.1], # β Interstellar
Float32[0.2, 0.7, 0.1, 0.5],
Float32[0.3, 0.6, 0.4, 0.3],
],
)
tbl = create_table(conn, "movies", movies)
# 3. Vector similarity search β find films whose embedding is closest
# to "Interstellar", but only from 2015 onward, and only return
# title / year / distance. Everything chains with |>.
query_vec = Float32[0.8, 0.2, 0.5, 0.1]
result =
vector_search(tbl, query_vec, "embedding") |>
filter_where("year > 2015") |>
select_cols(["title", "year", "_distance"]) |>
limit(3) |>
execute
# 4. QueryResult is Tables.jl-compliant β drop it straight into a DataFrame
df = DataFrame(Tables.columns(result))
println(df)
# β Row β title β year β _distance β
# β β String β Int32 β Float32 β
# βββββββΌβββββββββββββββββΌββββββββΌββββββββββββ€
# β 1 β Dune β 2021 β 0.009 β
# β 2 β Arrival β 2016 β 0.510 β
# 5. The expression DSL lets you build filters programmatically
# (useful when filter conditions come from user input or config)
high_rated_recent =
(col("rating") >= lit(7.8f0)) & (col("year") > lit(2015))
df2 = DataFrame(Tables.columns(
query(tbl) |> filter_expr(high_rated_recent) |> execute
))
# 6. Tables survive process restarts β LanceDB persists to disk
close(tbl)
close(conn)
conn2 = Connection("/tmp/movie_db")
tbl2 = open_table(conn2, "movies")
println(count_rows(tbl2)) # 5 β data is still there
println(table_version(tbl2)) # version history is preserved too
The key composability story: create_table accepts any Tables.jl source (NamedTuple, DataFrame, Arrow table, β¦), and execute returns a QueryResult that is itself Tables.jl compliant, so it flows directly into DataFrames.jl, CSV.jl, Arrow.jl, or any other sink & no conversion glue required.
There is a QUICKSTART.md which explains how the code can be used.