I’m trying understand the basics of GPU kernels for Julia. I’ve followed http://mikeinnes.github.io/2017/08/24/cudanative.html and the basic example for addition makes sense

```
using CuArrays, CUDAnative
n = 1024
xs, ys, zs = CuArray(rand(n)), CuArray(rand(n)), CuArray(zeros(n))
function kernel_vadd(out, a, b)
i = (blockIdx().x-1) * blockDim().x + threadIdx().x
out[i] = a[i] + b[i]
return
end
@cuda (1, n) kernel_vadd(zs, xs, ys)
```

On my graphics card, I’m limited to 1,024 threads and 4GB of video memory. My question is: if I package up some GPU function for others to use, how can I “automatically” determine how many blocks/threads to allocate on his/her graphics card? Additionally, what is the best practice for “wrapping” this `@cuda(blocks, threads) function...`

code?