CUDA.jl - Sub-Vector Indexing Problem Inside CUDA Kernel

Slicing an array like that creates a new array, and allocations like that are not allowed in device code. StaticArrays has this implemented by returning another StaticArray, which is GPU compatible, but in general it’s better to use a view here or just iterate the indices manually.