Is it hard to support Julia UDFs in DuckDB?

As far as I can tell there’s no UDF in Julia for DuckDB. Gemini says it’s easy to add but a google search turns up this Github issue which seems to suggest that it is available and need some work.

Does anyone know what’s the actual story here? Trying to see if it’s possible for Julia

As is see - table_function already supported.

    DuckDB.create_table_function(
        con,
        "forty_two_nulls",
        [Int64],
        my_bind_function,
        my_init_function,
        my_main_function_nulls
    )
    results = DBInterface.execute(con, 
       "SELECT COUNT(*) total_cnt, COUNT(forty_two) cnt FROM forty_two_nulls(10000)")

I can’t recall understand this code at all. So I am guessing, you are allowed to define functions that create tables. Not but a function you can use within the body of the SQL to compute columns.

It appears that the DuckDB C API now includes support for scalar functions, which was added about four months ago.

“This PR adds initial support for scalar functions to the C API. This allows simple scalar functions to be defined and registered using only the C API, which in turn will enable these functions to be added using other APIs that build on top of the C API (e.g. Rust or Go).”

(my guess)
To integrate this functionality with the Julia package, the duckdb_scalar_function* should be added to the api.jl file in the Julia package repository:

2 Likes

@ImreSamu were you able successfully add and use a UDF?

Unfortunately, I haven’t had time to work on this.
Feel free to handle it if you can.

1 Like

This should work if you clone and modify the package to add the c api support.

add scalar api to DuckDB.jl package
# ctypes.jl
const duckdb_scalar_function = Ptr{Cvoid}

# api.jl scalar functions
function duckdb_create_scalar_function()
    return ccall((:duckdb_create_scalar_function, libduckdb), duckdb_scalar_function, ())
end

function duckdb_destroy_scalar_function(func)
    return ccall((:duckdb_destroy_scalar_function, libduckdb), Cvoid, (Ref{duckdb_scalar_function},), func)
end

function duckdb_scalar_function_set_name(func, name)
    return ccall((:duckdb_scalar_function_set_name, libduckdb), Cvoid, (duckdb_scalar_function, Ptr{UInt8}), func, name)
end

function duckdb_scalar_function_add_parameter(func, type)
    return ccall(
        (:duckdb_scalar_function_add_parameter, libduckdb),
        Cvoid,
        (duckdb_scalar_function, duckdb_logical_type),
        func,
        type
    )
end

function duckdb_scalar_function_set_return_type(func, type)
    return ccall(
        (:duckdb_scalar_function_set_return_type, libduckdb),
        Cvoid,
        (duckdb_scalar_function, duckdb_logical_type),
        func,
        type
    )
end

function duckdb_scalar_function_set_function(scalar_func, func)
    return ccall(
        (:duckdb_scalar_function_set_function, libduckdb),
        Cvoid,
        (duckdb_scalar_function, Ptr{Cvoid}),
        scalar_func,
        func
    )
end

function duckdb_register_scalar_function(con, func)
    return ccall(
        (:duckdb_register_scalar_function, libduckdb),
        Int32,
        (duckdb_connection, duckdb_scalar_function),
        con,
        func
    )
end
MyAddition example
using DuckDB
function MyAddition(info::DuckDB.duckdb_function_info, input::DuckDB.duckdb_data_chunk, output::DuckDB.duckdb_vector)
    input = DuckDB.DataChunk(input, false)
    n = DuckDB.get_size(input)
    a_data = DuckDB.get_array(DuckDB.get_vector(input, 1), Int64, n)
    b_data = DuckDB.get_array(DuckDB.get_vector(input, 2), Int64, n)

    output_data = DuckDB.get_array(DuckDB.Vec(output), Int64, n)
    for row in 1:n
        output_data[row] = a_data[row] + b_data[row]
    end
end


db = DuckDB.DB()
con = DuckDB.connect(db)
# create a scalar function
f = DuckDB.duckdb_create_scalar_function()
DuckDB.duckdb_scalar_function_set_name(f, "my_addition")
# add two bigint parameters
type = DuckDB.duckdb_create_logical_type(DuckDB.DUCKDB_TYPE_BIGINT)
DuckDB.duckdb_table_function_add_parameter(f, type)
DuckDB.duckdb_table_function_add_parameter(f, type)
# set the return type to bigint
DuckDB.duckdb_scalar_function_set_return_type(f, type)
DuckDB.duckdb_destroy_logical_type(type)
# set up the function
CMyAddition = @cfunction(MyAddition, Cvoid, (DuckDB.duckdb_function_info, DuckDB.duckdb_data_chunk, DuckDB.duckdb_vector))
DuckDB.duckdb_scalar_function_set_function(f, CMyAddition)
# register and cleanup
DuckDB.duckdb_register_scalar_function(con.handle, f)
DuckDB.duckdb_destroy_scalar_function(f)

DuckDB.query(con, "CREATE TABLE big_table AS SELECT i FROM range(9) t(i)")
DuckDB.query(con, "SELECT my_addition(i, i) FROM big_table")
4 Likes

wow @era127 thank you so much for this example. I got it and others to work.

My understanding is that the Julia DuckDB package is more or less community maintained… Is there a reason you know of that this has functionality has not been added to official package?

Because I think it would be nice to have this without having to update the api locally

The client api is community maintained, but they have typically accepted PR within a couple of days. I think you could make a PR that would be accepted; using the table_function c api functions as a template to write the scalar_function api calls.

2 Likes

this is amazing! I want to have a crack at this

1 Like

ok, so I have put this API to a cloned DuckDB repo. I have passing tests. Before I put in a PR, are there any thoughts folks have around this? I am not sure how capable I am to make changes to the existing api that @era127 kindly provided.

I might toy around and try to make a function call such as the below that wraps up all of the different calls above into a single more convenient version.

create_scalar_function(
    con::DuckDB.Connection,
    name::AbstractString,
    parameter_types::Vector{Union{DuckDB.DuckDBType, Type}},
    return_type::Union{DuckDB.DuckDBType, Type},
    julia_function::Function
)

cc @xiaodai @ImreSamu

1 Like

This is awesome. Maybe just my ignorance, it says scalar function here but it’s also usable as a summarisation? right e.g. select grp, my_func(vars1) from xyz group by abc.

Is the ability implement vectorised arrays also the same? can seem to find the answer on my cursory search

I think that would require all of the “aggregate_function” c api calls to be exposed in the Julia client api. They are similarly named like duckdb_create_aggregate_function.

You can see the c api example in the test case.

1 Like

I guess that means it’s possible and just need some work

1 Like