Using PLASMA instead of LAPACK?

Hi,
I recently tested using PLASMA instead of LAPACK on a single socket 32 core AMD Threadripper.
The results below are from running LU decomposition.
It seems above a certain scale, PLASMA achieves much higher parallel performance than LAPACK.

Is there currently a simple way to use PLASMA instead of LAPACK?
Or a separate package maybe?
The thing is, PLASMA uses different subroutine names so it’s not ABI compatible with LAPACK.

8 Likes

Mathworks (Matlab company) are among the sponsors of PLASMA.

2 Likes

PLASMA needs fairly recent GCC infrastructure because of the advanced OpenMP directives. Would that require a build of Julia from source for compatibility?

There may be something wrong with your LAPACK build: I get 150 LU GFLOPS in Julia with OpenBLAS/LAPACK on an older 8-core system. I do have respectable DRAM bandwitdh, and you may not, but proper blocking should make that irrelevant. Nevertheless, the PLASMA performance is impressive.

However, note that on (some) Intel hardware, MKL is competitive with PLASMA, and MKL has a drop-in replacement for standard LAPACK.

In the long run, we look forward to the day when Julia/PARTR has scheduling tools that can do what PLASMA does, with less ugliness.

However, note that on (some) Intel hardware, MKL is competitive with PLASMA, and MKL has a drop-in replacement for standard LAPACK.

That is actually surprising.
I guess MKL already does fancy task-parallel scheduling under the hood?

There may be something wrong with your LAPACK build: I get 150 LU GFLOPS in Julia with OpenBLAS/LAPACK on an older 8-core system. I do have respectable DRAM bandwitdh, and you may not, but proper blocking should make that irrelevant. Nevertheless, the PLASMA performance is impressive.

This may be due to the way I calculated the FLOPS.
Please take the absolute values with a grain of salt.

Now that would be something worth waiting for!