The most general way to estimate the optimal arguments for @cuda macro

fedoroff · April 6, 2021, 1:28pm

According to this reply, the current CUDA API proposes the following approach to choose the number of threads and blocks needed to launch kernels:

using CUDA


function kernel(a, b)
    id = (blockIdx().x - 1) * blockDim().x + threadIdx().x
    stride = blockDim().x * gridDim().x
    N = length(a)
    for k=id:stride:N
        a[k] = b[k]
    end
    return nothing
end


N = 1024
a = CUDA.zeros(N)
b = CUDA.rand(N)

ckernel = @cuda launch=false kernel(a, b)
config = launch_configuration(ckernel.fun)
threads = min(N, config.threads)
blocks =  cld(N, threads)
ckernel(a, b; threads=threads, blocks=blocks)

@assert a == b

The corresponding code looks a bit bulky for me.
Would it be more convenient to wrap it into a macro? Something like

macro krun(ex...)
    len = ex[1]
    call = ex[2]

    args = call.args[2:end]

    @gensym kernel config threads blocks
    code = quote
        local $kernel = @cuda launch=false $call
        local $config = launch_configuration($kernel.fun)
        local $threads = min($len, $config.threads)
        local $blocks = cld($len, $threads)
        $kernel($(args...); threads=$threads, blocks=$blocks)
    end

    return esc(code)
end


@krun N kernel(a, b)

@assert a == b

Then it will be possible to launch kernels with a single argument, which corresponds to a number of required parallel processes: @krun N kernel(a, b).

Topic		Replies	Views
CUDA kernel configuration Performance gpu , cuda	3	683	March 28, 2022
CUDA: blockdimensions and launch_configuration New to Julia question	0	178	April 17, 2024
@cuda threads and blocks confusion GPU	9	3679	February 10, 2021
Understanding GPU Kernels GPU	4	2589	April 10, 2018
How do I make sure that GPU functions use the maximum potential config for performance? GPU	3	319	January 16, 2023

The most general way to estimate the optimal arguments for @cuda macro

Related topics