Hi,
I have a function that iterates over all elements of a four-dimenasional array full of CartesianIndices elements
21×31×41×51 Array{CartesianIndex{4},4}:
[:, :, 1, 1] =
CartesianIndex(1, 1, 1, 1) … CartesianIndex(1, 31, 1, 1)
CartesianIndex(2, 1, 1, 1) CartesianIndex(2, 31, 1, 1)
CartesianIndex(3, 1, 1, 1) CartesianIndex(3, 31, 1, 1)
CartesianIndex(4, 1, 1, 1) CartesianIndex(4, 31, 1, 1)
CartesianIndex(5, 1, 1, 1) CartesianIndex(5, 31, 1, 1)
...
the fact is the problem itself the function does is 100% parallelizable, as it has to do some independent operations for each element of the array, and do the product at then (not really the product, but similar stuff). That seems to be perfect for CUDA as there are many elements to process, each one independent of the rest.
Now the question is: can I directly work with these CartesianIndex elements in CUDA? Is this implemented? Or should I convert that to a 4-dimensional array amd work with that? In case I shall convert to an array, how do you properly do a nested for loop (one for each dimension of the CartesianIndex, (and here I have 4), taking full advantage of the CUDA parallel capabilities?
Thanks a lot,
Ferran.