Problems with LinearAlgebra functions within KernelAbstractions and CUDA

That @noinline function should probably have a type signature that forces specialization, in order to make the code GPU compatible. Maybe open an issue on StaticArraytsl.jl?