CUDAnative: Using second and third dims in the kernel

Colin_Beckingham · January 31, 2017, 1:03pm

In my attempts to generalize the example code given on the README.md I have had good results including processing arrays larger than the device blocksize, but have run into a block regarding the use of the second and third dimensions in the kernel.

I believe the code as presented processes the 3x4 array of floats as a 1x12 vector with the transfer back to an array somehow accomplished in the background. I guess I have a few questions: is it wise to ensure that the thread index does not exceed the dims of the known working area or is that taken care of transparently? My attempts to pass up a tuple for the grid dimensions in @cuda(…) were successful, but I am not getting sensible results from these efforts. I wonder if it might be helpful to include an example where we use at least the y component of the {x,y,z} set?

I can post some code if of interest.

Colin_Beckingham · January 31, 2017, 1:39pm

OK I think I have answered my own question and have it working, thanks.
However the puzzling thing is that in the kernel I can write it two ways, with the x dim alone or the x and y and both produce a result which passes the test. I’ll get it eventually.

maleadt · January 31, 2017, 3:58pm

Depends on the kernel preconditions. If you know that the index calculation will never yield an out-of-bounds index, you don’t have to. But when generalizing for larger arrays, that might not be possible (ie. 513 items on a max-512-threads device == 512 threads 2 blocks).

There’s a lot of existing literature on how to flexibly generalize kernels, e.g. writing grid-stride loops. You should also take care whether to launch more threads or more blocks, it determines occupancy and consequently performance, but also depends on the kernel and the hardware.

You can use @cuprintf to debug your index calculations, see this example.

Topic		Replies	Views
CUDAnative: kernel multidimensional access GPU cudanative	3	1168	February 3, 2017
Trying to understand 3d indexing GPU	3	780	September 6, 2021
CUDA \| nested loops kernel GPU question	5	163	May 12, 2025
Error when implementing multidimensional kernel GPU	6	635	November 27, 2023
Organizing Threads and Block @cuda, 3D Arrays General Usage cudanative , cuda	2	2433	March 18, 2019

CUDAnative: Using second and third dims in the kernel

Related topics