That’s very unlikely to work. You cannot dynamically allocate memory inside a GPU kernel (see also this recent post: Modifying a thread-local vector within CUDA Dynamic Parallelism - #2 by vchuravy).
What should work though is to allocate all CuArrays outside the kernel, then inside the kernel convert the relevant view
s into your arrays into SMatrix
/SVector
s and do the solve on StaticArrays only. (I don’t have access to a GPU atm to check)