Hello,

I have an application where I need >2 dimensional sparse arrays and I would like to test running them on the GPU as the application is embarrassingly parallel. My plan was to make a CuArray of the type TensorVal type using the function below. This function does run but only if I use @allowscalar. From my understanding its ok that this runs on the CPU because I’m just initializing the array. However, everything breaks when I try to use this array in the `cuda_sparse`

function and I get index out of bounds error and the REPL crashes. I’m not sure if the issue is in the initialization or in the execution. Any help would be appreciated! If there are better ways to do this I’d love to hear them.

```
struct TensorVal
i::Int32
j::Int32
k::Int32
val::Float32
end
function get_non_zero_gpu(F3::SparseArray)
num_nonzero = length(nonzero_values(F3))
F3_non_zero = CuArray{TensorVal}(undef,(num_nonzero,))
for (idx, val) in nonzero_pairs(F3)
F3_non_zero[idx] = TensorVal(idx[1], idx[2], idx[3], val)
end
return F3_non_zero
end
```

For reference this is how I am trying to use the array (basically a high dimensional dot product):

```
function cuda_sparse(cuF3_sparse, cuPhi1, cuPhi2, cuPhi3)
f = (f3_data) -> f3_data.val * cuPhi1[f3_data.i] * cuPhi2[f3_data.j] * cuPhi3[f3_data.k]
return mapreduce(f, +, cuF3_sparse)
end
```