Optimising iterative solution of small system

It looks like you aren’t updating n.J in this loop? If so, you can just compute its LU factorization once and re-use it.

Also, it seems like it would be a lot easier, and possibly faster, to use SVector and SArray rather than MVector and MArray in this code. Then you can ditch all the “in-place” operations, write in a more natural style, and it still won’t allocate.