[ANN] LanceDB.jl — Julia wrapper for LanceDB

asbisen · April 23, 2026, 6:23pm

Hi all I would like to share an initial release of LanceDB.jl, a Julia wrapper for LanceDB a high-performance, embedded vector database designed in rust for AI and machine learning workflows.

What is LanceDB?

LanceDB is an open-source, serverless vector database built on top of the Lance columnar format. It is designed for storing, indexing, and querying high-dimensional vector embeddings alongside structured metadata making it a natural fit for retrieval-augmented generation (RAG) pipelines, semantic search, and other embedding-heavy ML workflows. Unlike many vector databases, LanceDB runs embedded (no separate server process), which makes it extremely convenient for local development and research.

While playing with excellent Julia library PromptingTools.jl I was stuggling with a way to integrate with a vector database while staying in pure Julia. I dabbled with Qdrant and then stumbled on LanceDB, it’s very similar to DuckDB in a sense that it’s in-process and uses Arrow structure for on disk format. The design seems very well thought out and the performance is very good.

Fortunately they have C bindings published for the database too, and that allowed me to use DuckDB.jl as a reference and with the help from Qwen/CoPilot I was able to create the wrapper. The build is not very clean and requires manual steps as I was not able to find lancedb in Yggdrasil. It works well in practice, but I recognize that the manual build step is a friction point for adoption as well as automating with CI/CD pipeline.

Help Required If anyone has experience building binary packages for Yggdrasil, I would genuinely appreciate help creating a lancedb_c_jll builder recipe. This would allow the library to be distributed as a proper JLL package and remove the manual build requirement entirely, which is really the last big hurdle before this can be registered. Please feel free to open an issue or reach out directly! Also I am not sure what are the policies around adding packages in general registry that are built with the help of LLMs. Since there is not much original work here as it’s just wrapping an existing C/Rust library it may not be a big issue.

Usage

using LanceDB
using Tables
using DataFrames

# 1. Connect (embedded — no server needed)
conn = Connection("/tmp/movie_db")

# 2. Create a table from any Tables.jl-compatible source
#    Here we use a plain NamedTuple; a DataFrame works identically.
movies = (
    id        = Int32[1, 2, 3, 4, 5],
    title     = ["Dune", "Arrival", "Interstellar", "Ex Machina", "Annihilation"],
    year      = Int32[2021, 2016, 2014, 2014, 2018],
    rating    = Float32[7.9, 7.9, 8.6, 7.7, 6.8],
    embedding = [
        Float32[0.9, 0.1, 0.2, 0.0],
        Float32[0.1, 0.8, 0.3, 0.2],
        Float32[0.8, 0.2, 0.5, 0.1],   # ← Interstellar
        Float32[0.2, 0.7, 0.1, 0.5],
        Float32[0.3, 0.6, 0.4, 0.3],
    ],
)

tbl = create_table(conn, "movies", movies)

# 3. Vector similarity search — find films whose embedding is closest
#    to "Interstellar", but only from 2015 onward, and only return
#    title / year / distance.  Everything chains with |>.
query_vec = Float32[0.8, 0.2, 0.5, 0.1]

result =
    vector_search(tbl, query_vec, "embedding") |>
    filter_where("year > 2015")              |>
    select_cols(["title", "year", "_distance"]) |>
    limit(3)                                  |>
    execute

# 4. QueryResult is Tables.jl-compliant — drop it straight into a DataFrame
df = DataFrame(Tables.columns(result))
println(df)
# │ Row │ title          │ year  │ _distance │
# │     │ String         │ Int32 │ Float32   │
# ├─────┼────────────────┼───────┼───────────┤
# │ 1   │ Dune           │ 2021  │ 0.009     │
# │ 2   │ Arrival        │ 2016  │ 0.510     │

# 5. The expression DSL lets you build filters programmatically
#    (useful when filter conditions come from user input or config)
high_rated_recent =
    (col("rating") >= lit(7.8f0)) & (col("year") > lit(2015))

df2 = DataFrame(Tables.columns(
    query(tbl) |> filter_expr(high_rated_recent) |> execute
))

# 6. Tables survive process restarts — LanceDB persists to disk
close(tbl)
close(conn)

conn2 = Connection("/tmp/movie_db")
tbl2  = open_table(conn2, "movies")
println(count_rows(tbl2))       # 5  — data is still there
println(table_version(tbl2))    # version history is preserved too

The key composability story: create_table accepts any Tables.jl source (NamedTuple, DataFrame, Arrow table, …), and execute returns a QueryResult that is itself Tables.jl compliant, so it flows directly into DataFrames.jl, CSV.jl, Arrow.jl, or any other sink & no conversion glue required.

There is a QUICKSTART.md which explains how the code can be used.

Topic		Replies	Views
Vector Database in Julia: bad idea? Data question , idea , database	19	3371	May 21, 2024
[ANN] SemaDbAPI.jl - Julia Client for SemaDB Vector Database Package Announcements api , database	0	133	September 14, 2025
WunDeeDB.jl a lightweight sqlite backed vector database Package Announcements package , announcement	0	171	June 24, 2025
ANN: JuliaDB.jl Community	37	10305	May 11, 2017
Vector databases General Usage vector-search	5	671	June 20, 2024

[ANN] LanceDB.jl — Julia wrapper for LanceDB

Usage

Related topics