That is a fair point. But we shouldn’t miss the important part of the story: the performance of the BLAS-supported LU factorization is erratic for smaller matrix sizes, and in fact it is outperformed by the generic Julia version (both WITH PIVOTING).
With just a small change to the code (the “better” implementation of the generic LU factorization), the performance can be improved further.
The break-even point between lu!
and generic_lufact!
is around 200 equations. NB: For 10 equations or less, the static-array solver implementation may provide further improvements in speed with the generic version of the factorization. (I don’t know if the static array can be passed to the BLAS-supported solver.)
I think it might be of interest to compare with the MKL solver. If anyone has access to it, would you please run the code in Testing LU · GitHub and post the generated graph?
EDIT: Additional results for complex matrices. Same computer as above.