How to insert nested/composite types into DuckDB from a DataFrame?

Hi all, it seems DuckDB can support composite types as in Redirecting…

I want to know how to insert into columns with one of these types from a DataFrame. I’ve tried the Appender API and directly using prepared statements, none of which seems to work. Can someone help me? A MWE is below:

using DuckDB, DataFrames

function load_table!(db::T, df::DataFrame, name) where {T<:DuckDB.DB}
    sch = DataFrame(DBInterface.execute(db, "DESCRIBE $(name);"))
    sch_cols = sch[!, :column_name]
    @assert Set(names(df)) == Set(sch_cols)
    appender = DuckDB.Appender(db, name)
    for i in eachrow(df)
        for j in sch_cols
            # database columns are ordered
            DuckDB.append(appender, i[j])
        end
        DuckDB.end_row(appender)
    end
    DuckDB.flush(appender)
    DuckDB.close(appender)
end

db_mem = DBInterface.connect(DuckDB.DB, ":memory:")
DBInterface.execute(db_mem, "CREATE TABLE tab1 (id INTEGER PRIMARY KEY, vals INTEGER[]);")
tab1_df = DataFrame(id=1:10, vals=fill([1,2,3],10))
load_table!(db_mem, tab1_df, "tab1") # errors

stmt = DBInterface.prepare(db_mem, "INSERT INTO tab1 VALUES(:id, :vals)") 
DBInterface.executemany(stmt, (id=tab1_df.id, vals=tab1_df.vals)) # errors

stmt = DBInterface.prepare(db_mem, "INSERT INTO tab1 VALUES(?, ?)") 
DBInterface.executemany(stmt, (id=tab1_df.id, vals=tab1_df.vals)) # errors

Looks like it’s not supported at the moment (same issue for the Rust client: Appender support for arrays (or alternative) · Issue #422 · duckdb/duckdb-rs · GitHub).

As a workaround, you can:

# flatten the df
tab1_df = DataFrame(id=1:10, vals=fill([1,2,3],10))
flat_df = flatten(tab1_df, [:vals])

# register the data frame or create the table in whatever way
DuckDB.register_data_frame(db_mem, flat_df, "flat_tab1")

# create aggregated table 
DuckDB.query(
  db_mem,
  "CREATE OR REPLACE TABLE tab1 AS
  SELECT
    id,
    ARRAY_AGG(vals) AS vals,
  FROM flat_tab1
  GROUP BY
    id",
)

# trust buy verify
DuckDB.query(db_mem, "FROM tab1") |> DataFrame

Please let me know if this is helpful.

PS. If you can add the errors you’re having, it will help more people find this issue (especially in the future)

Thanks! I’m going to look into the appender API and see if it could be adapted to suit this use case.

The errors:

From append

ERROR: unsupported type for append
Stacktrace:
 [1] append(appender::DuckDB.Appender, val::Vector{Int64})
   @ DuckDB ~/.julia/packages/DuckDB/5Gcwk/src/appender.jl:108
 [2] load_table!(db::DuckDB.DB, df::DataFrame, name::String)
   @ Main ./REPL[16]:9
 [3] top-level scope
   @ REPL[20]:1

From prepared statements:

ERROR: unsupported type for bind
Stacktrace:
 [1] duckdb_bind_internal(stmt::DuckDB.Stmt, i::Int64, val::Vector{Int64})
   @ DuckDB ~/.julia/packages/DuckDB/5Gcwk/src/statement.jl:72
 [2] bind_parameters(stmt::DuckDB.Stmt, params::DBInterface.LazyIndex{@NamedTuple{id::Vector{…}, vals::Vector{…}}})
   @ DuckDB ~/.julia/packages/DuckDB/5Gcwk/src/statement.jl:78
 [3] execute(stmt::DuckDB.Stmt, params::DBInterface.LazyIndex{@NamedTuple{id::Vector{Int64}, vals::Vector{Vector{Int64}}}})
   @ DuckDB ~/.julia/packages/DuckDB/5Gcwk/src/result.jl:709
 [4] execute
   @ ~/.julia/packages/DuckDB/5Gcwk/src/result.jl:872 [inlined]
 [5] #6
   @ ~/.julia/packages/DBInterface/nQcsk/src/DBInterface.jl:206 [inlined]
 [6] transaction(f::DBInterface.var"#6#8"{DuckDB.Stmt, @NamedTuple{…}, Int64}, con::DuckDB.Connection)
   @ DuckDB ~/.julia/packages/DuckDB/5Gcwk/src/transaction.jl:5
 [7] executemany(stmt::DuckDB.Stmt, params::@NamedTuple{id::Vector{Int64}, vals::Vector{Vector{Int64}}})
   @ DBInterface ~/.julia/packages/DBInterface/nQcsk/src/DBInterface.jl:203
 [8] top-level scope
   @ REPL[24]:1