Understanding stride loop

Mostly. config.blocks isn’t a maximum, it’s a suggested minimum. So in principle you never need a grid stride since you can almost always extend the block size, however, you can’t in all dimensions, and sometimes it can put additional pressure on the block scheduler where a simple while loop in a kernel doesn’t.

All this isn’t CUDA.jl specific though, so refer to the NVIDIA blog post for other details and advantages: https://developer.nvidia.com/blog/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/

3 Likes