I’m trying understand the basics of GPU kernels for Julia. I’ve followed http://mikeinnes.github.io/2017/08/24/cudanative.html and the basic example for addition makes sense
using CuArrays, CUDAnative n = 1024 xs, ys, zs = CuArray(rand(n)), CuArray(rand(n)), CuArray(zeros(n)) function kernel_vadd(out, a, b) i = (blockIdx().x-1) * blockDim().x + threadIdx().x out[i] = a[i] + b[i] return end @cuda (1, n) kernel_vadd(zs, xs, ys)
On my graphics card, I’m limited to 1,024 threads and 4GB of video memory. My question is: if I package up some GPU function for others to use, how can I “automatically” determine how many blocks/threads to allocate on his/her graphics card? Additionally, what is the best practice for “wrapping” this
@cuda(blocks, threads) function... code?