Slow lower triangular solves when compared to full Cholesky

Doesn’t Mf.L first have to copy the triangular factor into a newly allocated matrix? This has O(n^2) complexity similar to the solve itself, so it is quite a significant cost.

When you interpolate $(Mf.L) into @btime, you are precomputing this copy operation, removing it from the benchmark time.

1 Like