I’ve been trying to come to grips with some very weird behaviour when using CSV.jl and DuckDB.jl in the same script. Basically I want to parse some large CSV files and store their data in a DuckDB database file using the Appender API.
However, the behaviour I’m seeing is that the appending sometimes fails with an error and nothing ends up in the .db file. But the behaviour is sensitive to extremely weird aspects of the script. I managed to reduce it to the full script shown all the way at the bottom (including reducing the schema and data used), but here’s the main part:
function go(db_file, csv_file)
db = DuckDB.DB(db_file)
DBInterface.execute(db, "create table if not exists jobs (year int not null)")
appender = DuckDB.Appender(db, "jobs")
# CSV.read call needs to be after creating the Appender for the bug to show up...
println(csv_file)
# Note: df result is not used at all!
df = CSV.read(csv_file, DataFrame, delim="\t")
#println(df)
row = (2026,)
for (idx, value) in enumerate(row)
res = DuckDB.append(appender, value)
check_for_error(appender, "append $idx", res)
end
#DuckDB.end_row(appender)
res = DuckDB.duckdb_appender_end_row(appender.handle)
check_for_error(appender, "end_row", res)
res = DuckDB.duckdb_appender_close(appender.handle)
check_for_error(appender, "close", res)
end
go(db_file, csv_file)
If I run this it shows an error from the appender API and the .db file contains no rows:
melis@blackbox 08:16:~/concepts/snellius-job-stats$ j --project -t1 bug_db_import_csv_file.jl test.db doh2.csv
doh2.csv
Appender error (close): Appender: Attempting to flush data to a closed connection
melis@blackbox 08:16:~/concepts/snellius-job-stats$ duckdb test.db .dump
BEGIN TRANSACTION;
CREATE TABLE jobs("year" INTEGER NOT NULL);;
COMMIT;
But note that in this testcase script the data read from the CSV file into df is not used at all to fill the database (row is hardcoded here, while in my original production code df was iterated over). However, the CSV.read() call is needed for the DuckDB appender error to show up. Moving the CSV.read() call to happen before DuckDB.DB() or just before DuckDB.Appender() makes the error go away (and then data ends up in the database file).
I have been able to reduce the testcase and data from the original production case. But this one has been extremely elusive to pin down, with seemingly unrelated code having an influence on the occurrence of the error, such as if the df value is printed, or using CSV.read("doh2.csv", ... instead of CSV.read(csv_file, ... (where csv_file == "doh2.csv", yes, really).
This feels like some kind of memory corruption or even GC bug to me. I could use some help which step(s) would make sense to further figure this out. I’m not sure if the issue is in DuckDB, CSV, or even Julia itself. I can build Julia from source with some debugging flags enabled, but would need a bit of advice on what to use, and any help is appreciated.
Some more details:
The CSV file only contains 258 lines, and is around 115kB, but was derived from around 38k lines and is the smallest file I could create that shows the issue.
Julia info:
Julia Version 1.12.6
Commit 15346901f00 (2026-04-09 19:20 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, skylake)
GC: Built with stock GC
Threads: 8 default, 1 interactive, 8 GC (on 8 virtual cores)
Environment:
JULIA_PKG_USE_CLI_GIT = true
JULIA_NUM_THREADS = auto
Status `~/concepts/snellius-job-stats/Project.toml`
[68d1b5e8] AdaptiveKDTrees v0.1.0
[c7e460c6] ArgParse v1.2.0
[69666777] Arrow v2.8.1
[6e4b80f9] BenchmarkTools v1.8.0
[336ed68f] CSV v0.10.16
[159f3aea] Cairo v1.1.1
[13f3f980] CairoMakie v0.15.10
[324d7699] CategoricalArrays v1.1.1
[8be319e6] Chain v1.0.0
[35d6a980] ColorSchemes v3.31.0
[861a8166] Combinatorics v1.1.0
[a93c6f00] DataFrames v1.8.2
[d2f5444f] DuckDB v1.5.2
[cf6cc811] FunSQL v0.15.0
[e9467ef8] GLMakie v0.13.10
[1ecd5474] GraphMakie v0.6.3
[86223c79] Graphs v1.14.0
⌅ [cd3eb016] HTTP v1.11.0
[dc4a2547] J3 v0.2.5 `~/concepts/j3`
[682c06a0] JSON v1.6.0
[442fdcdd] Measures v0.3.3
⌅ [fa8bd995] MetaGraphsNext v0.7.5
[39abe10b] MySQL v1.5.1
[c020b1a1] NaturalSort v1.0.0
[b8a86587] NearestNeighbors v0.4.27
[46757867] NetworkLayout v0.4.10
[9b87118b] PackageCompiler v2.3.0
[626c502c] Parquet v0.8.6
[995b91a9] PlotUtils v1.4.4
[91a5bcdd] Plots v1.41.6
[08abe8d2] PrettyTables v3.3.2
[c4c386cf] Rsvg v1.1.0
[e6c19c8d] SGtSNEpi v0.4.0
[0aa819cd] SQLite v1.8.0
[8fd88ec8] SankeyPlots v0.3.0
[10745b16] Statistics v1.11.1
[2913bbd2] StatsBase v0.34.11
[f3b207a7] StatsPlots v0.15.8
[b8865327] UnicodePlots v3.8.3
[ade2ca70] Dates v1.11.0
[de0858da] Printf v1.11.0
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`
Full script:
using CSV, DataFrames, DuckDB
if length(ARGS) < 2 || (length(ARGS) == 1 && ARGS[1] == "-h")
println("$(PROGRAM_FILE) file.db file.csv[.gz]")
println()
exit(-1)
end
db_file = ARGS[1]
csv_file = ARGS[2]
function check_for_error(appender, msg, res)
if res == DuckDB.DuckDBSuccess
return
end
# XXX not defined
#error_data = DuckDB.duckdb_appender_error_data(appender.handle)
#error_ptr = error_data.internal_pointer
#DuckDB.duckdb_destroy_error_data(error_data)
# XXX is deprecated
error_ptr = DuckDB.duckdb_appender_error(appender.handle)
if error_ptr == C_NULL
error_message = "unknown error"
else
# XXX Sigh, https://github.com/duckdb/duckdb/pull/7427
error_message = unsafe_string(error_ptr)
end
println("Appender error ($msg): $error_message")
end
function go(db_file, csv_file)
db = DuckDB.DB(db_file)
DBInterface.execute(db, "create table if not exists jobs (year int not null)")
appender = DuckDB.Appender(db, "jobs")
# CSV.read call needs to be after creating the Appender for the bug to show up...
println(csv_file)
df = CSV.read(csv_file, DataFrame, delim="\t")
#println(df)
row = (2026,)
for (idx, value) in enumerate(row)
res = DuckDB.append(appender, value)
check_for_error(appender, "append $idx", res)
end
#DuckDB.end_row(appender)
res = DuckDB.duckdb_appender_end_row(appender.handle)
check_for_error(appender, "end_row", res)
#res = DuckDB.duckdb_appender_flush(appender.handle)
#check_for_error(appender, "flush end of $fname", res)
#DuckDB.flush(appender)
#res = DuckDB.duckdb_appender_flush(appender.handle)
#check_for_error(appender, "flush", res)
#DuckDB.close(appender)
res = DuckDB.duckdb_appender_close(appender.handle)
check_for_error(appender, "close", res)
end
go(db_file, csv_file)