I did an interesting little exploration with DuckDB.jl + bridging to another local database of a different SQL flavor today (SQLite). Here’s what I did:
using BenchmarkTools
using DBInterface
using DuckDB
using HealthSampleData
using OMOPCDMCohortCreator
using SQLite
#=
Generate local dataset for testing
and configure connection details
=#
conn = SQLite.DB(Eunomia())
GenerateDatabaseDetails(:sqlite, "")
GenerateTables(conn)
#=
Execute and benchmark some test queries
=#
patients = GetDatabasePersonIDs(conn)
@btime GetPatientAgeGroups($patients[1:100], $conn)
Here, the benchmark gives about 439 microseconds on my computer running julia with 12 threads.
#=
Bridging over to DuckDB
=#
conn = DBInterface.connect(DuckDB.DB)
DuckDB.execute(conn, "INSTALL sqlite;")
DuckDB.execute(conn, "LOAD sqlite;")
DuckDB.execute(conn,
"""
ATTACH '$(Eunomia())' (TYPE SQLITE);
"""
)
DuckDB.execute(conn,
"""
USE eunomia;
"""
)
#=
Running queries and benchmarks again
=#
patients = GetDatabasePersonIDs(conn)
@btime GetPatientAgeGroup($patients[1:100], $conn);
This time it is about 3.5 milliseconds on my machine. This was not what I expected based on documentation of bridging from DuckDB. I expected to see a speed-up but alas, instead, a large slow down. I tried this with a PostgreSQL instance on my machine and similar slow-downs were seen.
This example is fully replicable as long as you install the packages at top and execute things line by line. I’d be curious where these slow-downs are occurring. Any thoughts?