Supporting appending `Vector` in DuckDB

Hi all, following up from How to insert nested/composite types into DuckDB from a DataFrame? - #3 by slwu89 I’m peeking into what it would take to support inserting, say Vector{Int} into a DuckDB database using the appender API.

It appears that even supporting this feature in the DuckDB C API is a relatively recent addition, see add duckdb_append_value to C API by jraymakers · Pull Request #15065 · duckdb/duckdb · GitHub which added duckdb_append_value to the C API. Luckily because the Julia interface to the C API is programatically generated, it is there, waiting to be used.

I want to start thinking about how to do the simple case of inserting a vector of primitive types into the DB. Looking at the tests for duckdb_append_value, the relevant example is here: public-duckdb/test/api/capi/test_capi_appender.cpp at 5fb7d105c5bc3806921cb596db98b46f6166df5b · motherduckdb/public-duckdb · GitHub. Peeking at the test, the basic thing is:

  1. use duckdb_create_list_value (see Values – DuckDB) to create an instance of duckdb_value, which stores the element type, the pointer to values, and the length of the list.
  2. use duckdb_append_value (see Appender – DuckDB) to append that duckdb_value from 1.

My main question at this point revolves around step 1. Because the values argument for duckdb_create_list_value is a duckdb_value*, it seems I’d have to detect the eltype of the Vector and then use the appropriate duckdb_create_X method from the Julia interface to the C API. Does this approach seem reasonable or have I gotten things wrong (yes, ignoring missings for now)? I’m new to DuckDB and its Julia package, but I’m interested in contributing to expanding the functionality of the Julia interface.

There is a PR which has been opened to support this.

Currently in the appender a vector of UInt8 is interpreted as a duckdb BLOB type and not UTINYINT. It would be a breaking change to the appender api to change this. I’m not sure which Julia types would best map to the duckdb types.

Just a public service announcement, thanks to @era127’s [Julia] Add support for appending duckdb List types by era127 · Pull Request #16512 · duckdb/duckdb · GitHub which is merged, if you update the DuckDB package, the code I originally had trouble with in the previous discourse topic now runs as expected (see below):

using DuckDB, DataFrames

function load_table!(db::T, df::DataFrame, name) where {T<:DuckDB.DB}
    sch = DataFrame(DBInterface.execute(db, "DESCRIBE $(name);"))
    sch_cols = sch[!, :column_name]
    @assert Set(names(df)) == Set(sch_cols)
    appender = DuckDB.Appender(db, name)
    for i in eachrow(df)
        for j in sch_cols
            # database columns are ordered
            DuckDB.append(appender, i[j])
        end
        DuckDB.end_row(appender)
    end
    DuckDB.flush(appender)
    DuckDB.close(appender)
end

db_mem = DBInterface.connect(DuckDB.DB, ":memory:")
DBInterface.execute(db_mem, "CREATE TABLE tab1 (id INTEGER PRIMARY KEY, vals INTEGER[]);")
tab1_df = DataFrame(id=1:10, vals=fill([1,2,3],10))
load_table!(db_mem, tab1_df, "tab1") # errors
DBInterface.execute(db_mem, "SELECT * FROM tab1;") |> DataFrame