I recently tested using PLASMA instead of LAPACK on a single socket 32 core AMD Threadripper.
The results below are from running LU decomposition.
It seems above a certain scale, PLASMA achieves much higher parallel performance than LAPACK.
Is there currently a simple way to use PLASMA instead of LAPACK?
Or a separate package maybe?
The thing is, PLASMA uses different subroutine names so it’s not ABI compatible with LAPACK.