Using DuckDB appender with JSON column

Hi all, I’m interested to know if it would be possible to adjust the Appender API in DuckDB to support JSON. Let me explain.

As noted here JSON Overview – DuckDB we can create columns with the JSON data type. However, if we say create a dataframe with a column of formatted JSON strings and naively use the appender, it won’t work. I guess that when the appender gets a String type from Julia, it will call down to the duckdb_append_varchar method (see duckdb/tools/juliapkg/src/appender.jl at main · duckdb/duckdb · GitHub) but there is no automatic coversion of the VARCHAR (corresponding DuckDB type) to JSON. Although the bottom of this page indicates that it is possible, at least in some cases JSON Type – DuckDB.

So, to ask the DuckDB experts, is this possible? Is it possible from the Julia level to some how detect a column has JSON type and allow VARCHAR conversion? I also looked at the C API and am not even sure how (if) this would work at that level. Appender – DuckDB

For interest, here is a little code snippet that illustrates what I mean.

using DuckDB, DataFrames, JSON3

if isfile("./test.db")
    rm("./test.db")
end

con = DBInterface.connect(DuckDB.DB, "./test.db")
DBInterface.execute(con, """
    CREATE TABLE tab (
        name TEXT PRIMARY KEY,
        data JSON
    );
""")

# works
# DBInterface.execute(con, """
#     INSERT INTO tab VALUES
#     ('a', '{"name":"Beavis", "age":30, "car":null}'),
#     ('b', '{"name":"Butthead", "age":30, "car":"bad"}');
# """)

# Appender won't work
df = DataFrame(
    name = ["a", "b"],
    data = ["""{"name":"Beavis", "age":30, "car":null}""", """{"name":"Butthead", "age":30, "car":"bad"}"""]
)

appender = DuckDB.Appender(con, "tab")
for i in eachrow(df)
    for j in i
        DuckDB.append(appender, j)
    end
    DuckDB.end_row(appender)
end

# get the data
vals = DBInterface.execute(con, "SELECT * FROM tab;") |> DataFrame
JSON3.read(vals[1, :data])

DBInterface.close!(con)

Maybe I’m not understanding your question, but I don’t seem to have an issue with using the Appender for JSON data. Example:

#Data where :metadata is a column with JSON string as a String
  data = DataFrames.select(
  withJStr, #Other DataFrame with example JSON String column and other irrelevant data
  :id => (x->collect(1:length(x))),
  :id => identity => :acc,
  :metadata
);

#Appender API adding
DBInterface.execute(dconn,
    "CREATE OR REPLACE TABLE jtable (id INTEGER PRIMARY KEY, acc STRING, metadata JSON)")
# create data to insert
appender = DuckDB.Appender(dconn, "jtable")
for i in eachrow(data)
    for j in i
        DuckDB.append(appender, j)
    end
    DuckDB.end_row(appender)
end
# close the appender after all rows
DuckDB.close(appender)

DBInterface.execute(dconn,"select metadata->'\$.annotationScore' as jPathQ, * from jtable") |> DataFrame

returns example:

Row  	jPathQ	id	     acc	                       metadata
         String	Int32	String	                        String 
1	      3	      1	    uniprot_meta:A0A009H1I9	       {"annotationScore":3,....(truncated for space)
...
...

Is that what you are trying to do?

1 Like

What’s withJStr?

It’s just a DataFrame with a column that has a JSON String. It had a bunch of other junk too so I just selected the example columns needed.

1 Like

Ok, thanks. It still appears to not work for me, unless you see something I’m doing wrong here. However directly using DBInterface.execue works, as in the main MWE above.

julia> # Appender won't work
       df = DataFrame(
           name = ["a", "b"],
           data = JSON3.write.([Dict(:k1=>"v", :k2=>30), Dict(:k1=>"x", :k2=>1)])
       )
2×2 DataFrame
 Row │ name    data               
     │ String  String             
─────┼────────────────────────────
   1 │ a       {"k1":"v","k2":30}
   2 │ b       {"k1":"x","k2":1}

julia> appender = DuckDB.Appender(con, "tab")
DuckDB.Appender(Ptr{Nothing} @0x000000014a750b00)

julia> for i in eachrow(df)
           for j in i
               DuckDB.append(appender, j)
           end
           DuckDB.end_row(appender)
       end

       # get the data

julia> vals = DBInterface.execute(con, "SELECT * FROM tab;") |> DataFrame
0×2 DataFrame
 Row │ name    data   
     │ String  String 
─────┴────────────────

Closing this. It was a dumb mistake. I didn’t close the appender DuckDB.close(appender). JSON is working fine. Thanks for the help @NAS.