CUDA kernel: how to pass an array of functions

Shuhua · February 7, 2021, 12:29pm

I am working on genetic programming, where I need to pick a function from a set of functions fs by an integer index in each node of the tree. An example demonstrating the difficulty I encountered is given below.

using CUDA

# a kernel
function pass_func_array(x::Float32, fs)
    for f in fs
        x = f(x)
    end
    @cuprintln("x = $x")
    nothing
end

where fs is an array of functions. However, the following way does not work since Function is not a bit type.

fs = CuArray([sin, cos])
LoadError: CuArray only supports bits types

fs = CuArray([CUDA.sin, CUDA.cos])
LoadError: CuArray only supports bits types

Any ideas on how to fix this?

I tried to pass symbols (i.e., function names), but a Symbol is not a bits type either.

maxfreu · February 7, 2021, 3:44pm

Hi, dangerous half-knowledge here, but I think this is a use case for metaprogramming. As far as I know, everything you use in a kernel has to be hard-coded or must be inlineable. Maybe you can write a function which generates the entire kernel code for you, instead of passing stuff to a kernel.

Shuhua · February 8, 2021, 2:30am

Thanks. Maybe that is a solution, but I will first ask the expert @maleadt for help.

marius311 · February 8, 2021, 5:36am

You might find the answer I gave here helpful, although disclaimer I am not the expert (and I mean that seriously, there might be better ways)

maleadt · February 8, 2021, 7:49am

We currently don’t support indirect device function calls, so you need something that’s fully analyzable by the Julia compiler. The solution by @marius311 will work, which basically boils down to using a tuple instead of an array and making sure Julia sees individual accesses of that tuple (instead of an iteration like your for loop):

julia> function pass_func_array(x::Float32, fs)
           x = fs[1](x)
           @cuprintln("x = $x")
           y = fs[2](x)
           @cuprintln("y = $y")
           nothing
       end
pass_func_array (generic function with 1 method)

julia> @cuda pass_func_array(1f0, (CUDA.sin, CUDA.cos));

x = 0.841471
y = 0.666367

Shuhua · February 8, 2021, 10:14am

Many thanks.

Does it mean that only a constant index is allowed? The following example does not work either.

function pass_func_array2(x::Float32, order::CuArray, fs)
    for i in order
        x = fs[i](x)
        @cuprintln("x = $x")
    end
    nothing
end
@cuda pass_func_array2(3.14f0, CuArray([1, 2, 1]), (CUDA.sin, CUDA.cos))

The order above actually encodes an expression tree, where each integer represents an operation, and I try to execute this tree on GPU. I don’t think it is proper to turn order into a tuple of functions directly since it may be very long (tens or hundreds). Is there any workaround to do what I want in CUDA.jl for now?

maleadt · February 8, 2021, 2:54pm

You could create a trampoline function, but generally no this pattern is not supported.

piever · February 8, 2021, 3:42pm

Not sure if this is relevant, but recursion, map and foldl tend to produce better code than for loops for short tuples, so you may try things like:

foldl(|>, fs, init=x)

rather than

for f in fs
    x = f(x)
end

If fs is itself large, you may be better off replacing fs[i](x) with a unique function f(x, i) (if that’s possible in your use case).

Topic		Replies	Views
Use GPU subfunction in a bigger-function? New to Julia cudanative , cuarrays	5	787	April 19, 2020
Sparse arrays and function arrays in CUDAnative.jl GPU cudanative	12	1774	July 10, 2017
Presentation on effective use of CUDAnative/CuArrays GPU	7	2410	January 3, 2019
Passing a wrapped array to a kernel GPU	2	557	May 27, 2020
Mapping functions using CuArrays GPU cuarrays	2	1942	September 9, 2020

CUDA kernel: how to pass an array of functions

Related topics