CUDA: blockdimensions and launch_configuration

ErwinMoller · April 17, 2024, 2:00pm

Hello,

I am working on my understanding of blocks and threads/Block (which is called blockDimension).
So I wrote a simple kernel (called gpu_heavy! to impress my friends) that writes some values back to 4 CuArrays.
(Like blockIdx().x . etc )

Here under is boilerplate launching:

    myKernel = @cuda name = "I_AM_SOME_KERNEL" launch = false gpu_heavy!(d_thisBlockIdx, d_thisBlockDimx, d_thisThreadIdx, d_thisGridDimx)
    config = launch_configuration(myKernel.fun)
    threads = Base.min(length(d_thisBlockIdx), config.threads)
    blocks = cld(length(d_thisBlockIdx), threads)
    println("According to launch_configuration optimal threads=", threads, " and optimal blocks is: ", blocks)
   #And launch
   CUDA.@time myKernel(d_thisBlockIdx, d_thisBlockDimx, d_thisThreadIdx, d_thisGridDimx; threads=threads, blocks=blocks)

I understand the above code uses the occupancy API, which returns reasonable values for the number of threds and blocks.
Is that correct?

the kernel looks like:

 function gpu_heavy!(d_thisBlockIdx, d_thisBlockDimx, d_thisThreadIdx, d_thisGridDimx)
    # Remember this kernel has no direct idea which portion of the workload it is processing
    # so we need a way to index it to an unique part of the passed arrays (in this case they are all equally long)
    thisBlockIdx = blockIdx().x   
    thisBlockDimx = blockDim().x  # Blockdimension: I prefer to call it threadsPerBlock. There is also a possible y and z
    thisThreadIdx = threadIdx().x
    thisGridDimx = gridDim().x
    index = (thisBlockIdx - 1) * thisBlockDimx + thisThreadIdx

    stride = thisGridDimx * thisBlockDimx
    for i = index:stride:length(d_thisBlockIdx)       
        @inbounds d_thisBlockIdx[i] = thisBlockIdx
        @inbounds d_thisBlockDimx[i] = thisBlockDimx
        @inbounds d_thisThreadIdx[i] = thisThreadIdx
        @inbounds d_thisGridDimx[i] = thisGridDimx        
    end
    return nothing
end

I understand both the threads and the blocks can be also 2 dimensional and 3 dimensional (the .y and .z).
But the above code ignores that fact completely.
Since I am not working on 2 or 3 Dimensions, that is fine for me.

My question is this:
My call to CUDA.@time myKernel(d_thisBlockIdx, d_thisBlockDimx, d_thisThreadIdx, d_thisGridDimx; threads=threads, blocks=blocks) launches the work on the GPU.
But is it possible there are also y and z dimensions coming from config = launch_configuration(myKernel.fun) (which I am clearly ignoring)?

Thanks for your time!

Erwin

Topic		Replies	Views
@cuda threads and blocks confusion GPU	9	3679	February 10, 2021
CUDA kernel configuration Performance gpu , cuda	3	683	March 28, 2022
The most general way to estimate the optimal arguments for @cuda macro Performance gpu , cudanative	6	1778	April 6, 2021
Cuda - 2D and 3D grid and block dimensions General Usage question , cudanative , cuda	1	1092	July 15, 2019
Understanding GPU Kernels GPU	4	2589	April 10, 2018

CUDA: blockdimensions and launch_configuration

Related topics