@cuda threads and blocks confusion

The API has recently been simplified: https://github.com/JuliaGPU/CUDA.jl/blob/4eb99b9f53acfc02a01f92d4a0a2b219bf8994cc/src/indexing.jl#L32-L36

But yes, it’s best to use the occupancy API. You need to extend this yourself to multiple dimensions, the occupancy API only works with 1D threads/blocks. Alternatively, just convert the linear thread index to an appropriate 2D one in your kernel.