Cartesian Indices Sequence on the GPU

I have an array of integers i, e.g. [3,1,2], and I want to map them to Cartesian indices [(1, 3), (2, 1), (3, 2)].

On the CPU, the easy way of creating this array is CartesianIndex.(enumerate(i)). However this does not work on the GPU with CUDA arrays, because “Scalar indexing is disallowed”.

Other than converting the array back and forth between CPU and GPU, is there any way to map a GPU array i to a Cartesian index list? I found a solution:

to_ordered_index(i) = CartesianIndex.(1:length(i), i)

but I don’t know if this is optimal.

In my use case the index i could contain zeros that are then filtered out, for example [3,0,1,2,0] would be filtered to [3,1,2]:

mask = i .=! 0
oi = to_ordered_index(i[mask])

It is very important that invalid Cartesian indices like (2,0) do not show up anywhere since they could cause errors during backpropagation.