I am working on genetic programming, where I need to pick a function from a set of functions fs by an integer index in each node of the tree. An example demonstrating the difficulty I encountered is given below.
using CUDA
# a kernel
function pass_func_array(x::Float32, fs)
for f in fs
x = f(x)
end
@cuprintln("x = $x")
nothing
end
where fs is an array of functions. However, the following way does not work since Function is not a bit type.
fs = CuArray([sin, cos])
LoadError: CuArray only supports bits types
fs = CuArray([CUDA.sin, CUDA.cos])
LoadError: CuArray only supports bits types
Any ideas on how to fix this?
I tried to pass symbols (i.e., function names), but a Symbol is not a bits type either.
Hi, dangerous half-knowledge here, but I think this is a use case for metaprogramming. As far as I know, everything you use in a kernel has to be hard-coded or must be inlineable. Maybe you can write a function which generates the entire kernel code for you, instead of passing stuff to a kernel.
We currently don’t support indirect device function calls, so you need something that’s fully analyzable by the Julia compiler. The solution by @marius311 will work, which basically boils down to using a tuple instead of an array and making sure Julia sees individual accesses of that tuple (instead of an iteration like your for loop):
julia> function pass_func_array(x::Float32, fs)
x = fs[1](x)
@cuprintln("x = $x")
y = fs[2](x)
@cuprintln("y = $y")
nothing
end
pass_func_array (generic function with 1 method)
julia> @cuda pass_func_array(1f0, (CUDA.sin, CUDA.cos));
x = 0.841471
y = 0.666367
Does it mean that only a constant index is allowed? The following example does not work either.
function pass_func_array2(x::Float32, order::CuArray, fs)
for i in order
x = fs[i](x)
@cuprintln("x = $x")
end
nothing
end
@cuda pass_func_array2(3.14f0, CuArray([1, 2, 1]), (CUDA.sin, CUDA.cos))
The order above actually encodes an expression tree, where each integer represents an operation, and I try to execute this tree on GPU. I don’t think it is proper to turn order into a tuple of functions directly since it may be very long (tens or hundreds). Is there any workaround to do what I want in CUDA.jl for now?