This one has my head hurting. I’m not even sure how to explain the problem, but I’ll do my best.
I am working on a project using CUDAnative and one of my kernels takes a function as an argument.
I would like to pass additional arguments to this function. If I pass number as an argument it works, but if I pass a variable it doesn’t. Is there a way to “apply” variables as constants? Can this be done with a macro?
This minimual example shows what I mean:
using CUDAnative
using CuArrays
function applyfun(x::CuDeviceArray, F::CuDeviceArray, f)
i = ( blockIdx().x - 1) * blockDim().x + threadIdx().x
F[i] = f(x[i])
return nothing
end
nthreads = 512
nblocks = 10
F = cuzeros(nthreads * nblocks)
x = CuArray( LinRange(0, 10, nthreads*nblocks) )
f(x, c) = c*x^2
#This workds
g1(x) = f(x,1)
@cuda blocks=nblocks threads=nthreads applyfun(x, F, g1 )
#This doesn't work
b = 1
g2(x) = f(x,b)
@cuda blocks=nblocks threads=nthreads applyfun(x, F, g2 )
That doesn’t work because the captured b
can be modified. Make it a const and it works:
julia> const c = 1
1
julia> g3(x) = f(x,c)
g3 (generic function with 1 method)
julia> @cuda blocks=nblocks threads=nthreads applyfun(x, F, g3)
I tried that, but it doesn’t help because I need to run a model that loops over the variable–it can’t be constant. Is there another way to do it?
Sure, but then you can’t just capture a CPU variable. You’ll need to put that counter in GPU memory, e.g. using a single-element array.
Thanks, that solves my problem. What I found worked was to pass the variable into the kernel.
This works:
using CUDAnative
using CuArrays
function applyfun(x::CuDeviceArray, F::CuDeviceArray, f, C)
i = ( blockIdx().x - 1) * blockDim().x + threadIdx().x
F[i] = f(x[i], C)
return nothing
end
nthreads = 512
nblocks = 10
F = cuzeros(nthreads * nblocks)
x = CuArray( LinRange(0, 10, nthreads*nblocks) )
f(x, c) = c*x^2
C = 2
@cuda blocks=nblocks threads=nthreads applyfun(x, F, f, C)
I would like to learn more about metaprograming, do you think its possible to capture variables using
a macro?