Why Is There a Significant Slow-down with DuckDB.jl Bridging?

I did an interesting little exploration with DuckDB.jl + bridging to another local database of a different SQL flavor today (SQLite). Here’s what I did:

using BenchmarkTools
using DBInterface
using DuckDB
using HealthSampleData
using OMOPCDMCohortCreator
using SQLite

#=

Generate local dataset for testing
and configure connection details

=#
conn = SQLite.DB(Eunomia())

GenerateDatabaseDetails(:sqlite, "")
GenerateTables(conn)

#= 

Execute and benchmark some test queries

=# 

patients = GetDatabasePersonIDs(conn)
@btime GetPatientAgeGroups($patients[1:100], $conn)

Here, the benchmark gives about 439 microseconds on my computer running julia with 12 threads.

#=

Bridging over to DuckDB

=#
conn = DBInterface.connect(DuckDB.DB)

DuckDB.execute(conn, "INSTALL sqlite;")
DuckDB.execute(conn, "LOAD sqlite;")
DuckDB.execute(conn, 
       """
       ATTACH '$(Eunomia())' (TYPE SQLITE);
       """
)

DuckDB.execute(conn, 
       """
       USE eunomia;
       """
)

#=

Running queries and benchmarks again

=#
patients = GetDatabasePersonIDs(conn)
@btime GetPatientAgeGroup($patients[1:100], $conn);

This time it is about 3.5 milliseconds on my machine. This was not what I expected based on documentation of bridging from DuckDB. I expected to see a speed-up but alas, instead, a large slow down. I tried this with a PostgreSQL instance on my machine and similar slow-downs were seen.

This example is fully replicable as long as you install the packages at top and execute things line by line. I’d be curious where these slow-downs are occurring. Any thoughts?

Querying a relatively small set rows will usually be faster using the native sqlite connector since it optimized for row-wise queries. Also the sqlite database is loosely typed, while duckdb is typed and sqlite_scanner has to guess the datatypes. In this example, sqlite is storing the person_id field as a REAL, although it represents a INT. The performance in duckdb would be much better if you imported the data to a columnar format with types, such as a local duckdb table with “CREATE TABLE persons AS select * from eunomia.person” or copy the sqlite to a parquet file “COPY eunomia.person TO ‘person.parquet’”