I have the following *big picture* aim. Suppose I have a high dimensional array, e.g. 5D or 6D. On each index of those arrays, an independent computation delivers a different value that needs to be stored at this index. For example,

```
V[i1,i2,i3,i4,i5] = fun(x, i1,i2,i3,i4,i5)
```

I wanted to try whether I can hand over computation of `fun(x, i1,i2,i3,i4,i5)`

to a single thread on a GPU, hence have many threads work in parallel on this job. so far so good.

## Immediate Task: map `ThreadIdx().x`

to array indices

Ideally I wanted to just use the julia function `ind2sub`

to get a cartesian index from a linear index. that function is not exposed via `CUDAnative`

. to I rewrote it for a particular case. Nevermind if this is completely inefficient, at this point I just want to understand what’s going on. So, I have the following example, which compiles without error. I am having trouble seeing the output though. why do all of my attempts to copy the device array back to the host give me an error? First the version that works, then the error at the bottom.

- I am learning a lot here, so any comments most welcome. In particular, is that the right way to attack a 6D array?
- it seems that the
`@assert`

call verifies that I’m doing this correctly, but I want to get this array back. - In general, I am very confused that a function like
`mod`

or`tuple`

or`@assert`

works on the device, but something like`sub2ind`

does not. what’s the difference?

```
using CUDAnative, CUDAdrv
function do3D()
V = zeros(Int64,2,3,4)
d_V = CuArray(V)
@cuda threads=10 ind2sub3Dkernel(d_V)
# all of those error.
# copy!(V,d_V)
# x = Array(d_V)
# println(x[1])
# return d_V
end
# the kernel. cannot return a value, hence write into the supplied vector
# at each thread index, get the corresponding CartesianIndex, and
# recompose the linear index manually to check that this is correct.
function ind2sub3Dkernel(V::CuDeviceArray{Int64})
idx = (blockIdx().x-1) * blockDim().x + threadIdx().x
m = size(V)
n = myind2sub3D(size(V),idx)
# n = mytuple() # "works"
@assert idx == n[1] + (n[2]-1)*m[1] + (n[3]-1)*m[1]*m[2]
V[idx] = n[1] + (n[2]-1)*m[1] + (n[3]-1)*m[1]*m[2]
end
# can I have a *device function* that returns a tuple?
# seems I can.
function mytuple()
(1,2)
end
# my *implementation* of an old ind2sub version.
# had to specialize to 3D because I cannot do splatting on the device?
# I know this is terrible code, but that's not the point. (I hope!)
function myind2sub3D(dims::Tuple{Integer,Vararg{Integer}}, ind::Int)
ndims = length(dims)
@assert ndims==3
stride = dims[1]
for i=2:ndims-1
stride *= dims[i]
end
i2 = 0
i3 = 0
# a manual loop over i
i = 2
rest = rem(ind-1, stride) + 1
i3 = div(ind - rest, stride) + 1
ind = rest
stride = div(stride, dims[i])
i = 1
rest = rem(ind-1, stride) + 1
i2 = div(ind - rest, stride) + 1
ind = rest
stride = div(stride, dims[i])
o = tuple(ind,i2,i3)
# printing does not work
# @cuprintf("my indices are %ld, %ld, %ld\n",o[1],o[2],o[3])
# @cuprintf("i have ")
return o
# original implementation
# sub = ()
# for i=(ndims-1):-1:1
# rest = rem(ind-1, stride) + 1
# sub = tuple(div(ind - rest, stride) + 1, sub...)
# ind = rest
# stride = div(stride, dims[i])
# end
# return tuple(ind, sub...)
end
```

modifying the top level function to convert back to `Array`

does this:

```
...
x = Array(d_V)
...
julia> x=cudaVFI.do3D()
ERROR: CUDA error: unspecified launch failure (code #719, ERROR_LAUNCH_FAILED)
Stacktrace:
[1] macro expansion at /home/floswald/.julia/packages/CUDAdrv/GyXD/src/base.jl:145 [inlined]
[2] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /home/floswald/.julia/packages/CUDAdrv/GyXD/src/memory.jl:161
[3] alloc at /home/floswald/.julia/packages/CUDAdrv/GyXD/src/memory.jl:157 [inlined] (repeats 2 times)
[4] CUDAdrv.CuArray{Int64,3}(::Tuple{Int64,Int64,Int64}) at /home/floswald/.julia/packages/CUDAdrv/GyXD/src/array.jl:33
[5] CUDAdrv.CuArray(::Array{Int64,3}) at /home/floswald/.julia/packages/CUDAdrv/GyXD/src/array.jl:217
[6] do3D() at /home/floswald/git/VFI/Julia/cudaVFI/src/cutest.jl:192
[7] top-level scope
```

thanks!