Supporting appending `Vector` in DuckDB

Hi all, following up from How to insert nested/composite types into DuckDB from a DataFrame? - #3 by slwu89 I’m peeking into what it would take to support inserting, say Vector{Int} into a DuckDB database using the appender API.

It appears that even supporting this feature in the DuckDB C API is a relatively recent addition, see add duckdb_append_value to C API by jraymakers · Pull Request #15065 · duckdb/duckdb · GitHub which added duckdb_append_value to the C API. Luckily because the Julia interface to the C API is programatically generated, it is there, waiting to be used.

I want to start thinking about how to do the simple case of inserting a vector of primitive types into the DB. Looking at the tests for duckdb_append_value, the relevant example is here: public-duckdb/test/api/capi/test_capi_appender.cpp at 5fb7d105c5bc3806921cb596db98b46f6166df5b · motherduckdb/public-duckdb · GitHub. Peeking at the test, the basic thing is:

  1. use duckdb_create_list_value (see Values – DuckDB) to create an instance of duckdb_value, which stores the element type, the pointer to values, and the length of the list.
  2. use duckdb_append_value (see Appender – DuckDB) to append that duckdb_value from 1.

My main question at this point revolves around step 1. Because the values argument for duckdb_create_list_value is a duckdb_value*, it seems I’d have to detect the eltype of the Vector and then use the appropriate duckdb_create_X method from the Julia interface to the C API. Does this approach seem reasonable or have I gotten things wrong (yes, ignoring missings for now)? I’m new to DuckDB and its Julia package, but I’m interested in contributing to expanding the functionality of the Julia interface.

1 Like

There is a PR which has been opened to support this.

Currently in the appender a vector of UInt8 is interpreted as a duckdb BLOB type and not UTINYINT. It would be a breaking change to the appender api to change this. I’m not sure which Julia types would best map to the duckdb types.

2 Likes

Just a public service announcement, thanks to @era127’s [Julia] Add support for appending duckdb List types by era127 · Pull Request #16512 · duckdb/duckdb · GitHub which is merged, if you update the DuckDB package, the code I originally had trouble with in the previous discourse topic now runs as expected (see below):

using DuckDB, DataFrames

function load_table!(db::T, df::DataFrame, name) where {T<:DuckDB.DB}
    sch = DataFrame(DBInterface.execute(db, "DESCRIBE $(name);"))
    sch_cols = sch[!, :column_name]
    @assert Set(names(df)) == Set(sch_cols)
    appender = DuckDB.Appender(db, name)
    for i in eachrow(df)
        for j in sch_cols
            # database columns are ordered
            DuckDB.append(appender, i[j])
        end
        DuckDB.end_row(appender)
    end
    DuckDB.flush(appender)
    DuckDB.close(appender)
end

db_mem = DBInterface.connect(DuckDB.DB, ":memory:")
DBInterface.execute(db_mem, "CREATE TABLE tab1 (id INTEGER PRIMARY KEY, vals INTEGER[]);")
tab1_df = DataFrame(id=1:10, vals=fill([1,2,3],10))
load_table!(db_mem, tab1_df, "tab1") # errors
DBInterface.execute(db_mem, "SELECT * FROM tab1;") |> DataFrame
3 Likes