How to insert nested/composite types into DuckDB from a DataFrame?

Hi all, it seems DuckDB can support composite types as in Redirecting…

I want to know how to insert into columns with one of these types from a DataFrame. I’ve tried the Appender API and directly using prepared statements, none of which seems to work. Can someone help me? A MWE is below:

using DuckDB, DataFrames

function load_table!(db::T, df::DataFrame, name) where {T<:DuckDB.DB}
    sch = DataFrame(DBInterface.execute(db, "DESCRIBE $(name);"))
    sch_cols = sch[!, :column_name]
    @assert Set(names(df)) == Set(sch_cols)
    appender = DuckDB.Appender(db, name)
    for i in eachrow(df)
        for j in sch_cols
            # database columns are ordered
            DuckDB.append(appender, i[j])
        end
        DuckDB.end_row(appender)
    end
    DuckDB.flush(appender)
    DuckDB.close(appender)
end

db_mem = DBInterface.connect(DuckDB.DB, ":memory:")
DBInterface.execute(db_mem, "CREATE TABLE tab1 (id INTEGER PRIMARY KEY, vals INTEGER[]);")
tab1_df = DataFrame(id=1:10, vals=fill([1,2,3],10))
load_table!(db_mem, tab1_df, "tab1") # errors

stmt = DBInterface.prepare(db_mem, "INSERT INTO tab1 VALUES(:id, :vals)") 
DBInterface.executemany(stmt, (id=tab1_df.id, vals=tab1_df.vals)) # errors

stmt = DBInterface.prepare(db_mem, "INSERT INTO tab1 VALUES(?, ?)") 
DBInterface.executemany(stmt, (id=tab1_df.id, vals=tab1_df.vals)) # errors
1 Like

Looks like it’s not supported at the moment (same issue for the Rust client: Appender support for arrays (or alternative) · Issue #422 · duckdb/duckdb-rs · GitHub).

As a workaround, you can:

# flatten the df
tab1_df = DataFrame(id=1:10, vals=fill([1,2,3],10))
flat_df = flatten(tab1_df, [:vals])

# register the data frame or create the table in whatever way
DuckDB.register_data_frame(db_mem, flat_df, "flat_tab1")

# create aggregated table 
DuckDB.query(
  db_mem,
  "CREATE OR REPLACE TABLE tab1 AS
  SELECT
    id,
    ARRAY_AGG(vals) AS vals,
  FROM flat_tab1
  GROUP BY
    id",
)

# trust buy verify
DuckDB.query(db_mem, "FROM tab1") |> DataFrame

Please let me know if this is helpful.

PS. If you can add the errors you’re having, it will help more people find this issue (especially in the future)

2 Likes

Thanks! I’m going to look into the appender API and see if it could be adapted to suit this use case.

The errors:

From append

ERROR: unsupported type for append
Stacktrace:
 [1] append(appender::DuckDB.Appender, val::Vector{Int64})
   @ DuckDB ~/.julia/packages/DuckDB/5Gcwk/src/appender.jl:108
 [2] load_table!(db::DuckDB.DB, df::DataFrame, name::String)
   @ Main ./REPL[16]:9
 [3] top-level scope
   @ REPL[20]:1

From prepared statements:

ERROR: unsupported type for bind
Stacktrace:
 [1] duckdb_bind_internal(stmt::DuckDB.Stmt, i::Int64, val::Vector{Int64})
   @ DuckDB ~/.julia/packages/DuckDB/5Gcwk/src/statement.jl:72
 [2] bind_parameters(stmt::DuckDB.Stmt, params::DBInterface.LazyIndex{@NamedTuple{id::Vector{…}, vals::Vector{…}}})
   @ DuckDB ~/.julia/packages/DuckDB/5Gcwk/src/statement.jl:78
 [3] execute(stmt::DuckDB.Stmt, params::DBInterface.LazyIndex{@NamedTuple{id::Vector{Int64}, vals::Vector{Vector{Int64}}}})
   @ DuckDB ~/.julia/packages/DuckDB/5Gcwk/src/result.jl:709
 [4] execute
   @ ~/.julia/packages/DuckDB/5Gcwk/src/result.jl:872 [inlined]
 [5] #6
   @ ~/.julia/packages/DBInterface/nQcsk/src/DBInterface.jl:206 [inlined]
 [6] transaction(f::DBInterface.var"#6#8"{DuckDB.Stmt, @NamedTuple{…}, Int64}, con::DuckDB.Connection)
   @ DuckDB ~/.julia/packages/DuckDB/5Gcwk/src/transaction.jl:5
 [7] executemany(stmt::DuckDB.Stmt, params::@NamedTuple{id::Vector{Int64}, vals::Vector{Vector{Int64}}})
   @ DBInterface ~/.julia/packages/DBInterface/nQcsk/src/DBInterface.jl:203
 [8] top-level scope
   @ REPL[24]:1
1 Like