Use of CartesianIndices with CUDA?

Change

tot[i] = y[1]+y[2]

to

tot[i] = y[i][1]+y[i][2]

You can pass CartesianIndices(x) directly to GPU kernels, it will pass OneTo(len) instead of constructing an array.