CUDA kernel: how to pass an array of functions

I am working on genetic programming, where I need to pick a function from a set of functions fs by an integer index in each node of the tree. An example demonstrating the difficulty I encountered is given below.

using CUDA

# a kernel
function pass_func_array(x::Float32, fs)
    for f in fs
        x = f(x)
    end
    @cuprintln("x = $x")
    nothing
end

where fs is an array of functions. However, the following way does not work since Function is not a bit type.

fs = CuArray([sin, cos])
LoadError: CuArray only supports bits types

fs = CuArray([CUDA.sin, CUDA.cos])
LoadError: CuArray only supports bits types

Any ideas on how to fix this?

I tried to pass symbols (i.e., function names), but a Symbol is not a bits type either.

Hi, dangerous half-knowledge here, but I think this is a use case for metaprogramming. As far as I know, everything you use in a kernel has to be hard-coded or must be inlineable. Maybe you can write a function which generates the entire kernel code for you, instead of passing stuff to a kernel.

Thanks. Maybe that is a solution, but I will first ask the expert @maleadt for help. :smile:

You might find the answer I gave here helpful, although disclaimer I am not the expert :slight_smile: (and I mean that seriously, there might be better ways)

We currently don’t support indirect device function calls, so you need something that’s fully analyzable by the Julia compiler. The solution by @marius311 will work, which basically boils down to using a tuple instead of an array and making sure Julia sees individual accesses of that tuple (instead of an iteration like your for loop):

julia> function pass_func_array(x::Float32, fs)
           x = fs[1](x)
           @cuprintln("x = $x")
           y = fs[2](x)
           @cuprintln("y = $y")
           nothing
       end
pass_func_array (generic function with 1 method)

julia> @cuda pass_func_array(1f0, (CUDA.sin, CUDA.cos));

x = 0.841471
y = 0.666367

Many thanks.

Does it mean that only a constant index is allowed? The following example does not work either.

function pass_func_array2(x::Float32, order::CuArray, fs)
    for i in order
        x = fs[i](x)
        @cuprintln("x = $x")
    end
    nothing
end
@cuda pass_func_array2(3.14f0, CuArray([1, 2, 1]), (CUDA.sin, CUDA.cos))

The order above actually encodes an expression tree, where each integer represents an operation, and I try to execute this tree on GPU. I don’t think it is proper to turn order into a tuple of functions directly since it may be very long (tens or hundreds). Is there any workaround to do what I want in CUDA.jl for now?

You could create a trampoline function, but generally no this pattern is not supported.

Not sure if this is relevant, but recursion, map and foldl tend to produce better code than for loops for short tuples, so you may try things like:

foldl(|>, fs, init=x)

rather than

for f in fs
    x = f(x)
end

If fs is itself large, you may be better off replacing fs[i](x) with a unique function f(x, i) (if that’s possible in your use case).

1 Like