Performance gotcha in linear algebra lu()

That is a fair point. But we shouldn’t miss the important part of the story: the performance of the BLAS-supported LU factorization is erratic for smaller matrix sizes, and in fact it is outperformed by the generic Julia version (both WITH PIVOTING).

With just a small change to the code (the “better” implementation of the generic LU factorization), the performance can be improved further.

image

The break-even point between lu! and generic_lufact! is around 200 equations. NB: For 10 equations or less, the static-array solver implementation may provide further improvements in speed with the generic version of the factorization. (I don’t know if the static array can be passed to the BLAS-supported solver.)

I think it might be of interest to compare with the MKL solver. If anyone has access to it, would you please run the code in Testing LU · GitHub and post the generated graph?

EDIT: Additional results for complex matrices. Same computer as above.
image