LAPACK multithreading

question
build
#1

Hello everyone,
I would like to know how the binaries on the Julia downloads page page are generated. For example, what are the default compilation flags used? Would I get the same result if I compiled them on some (Linux) system, assuming I do not link to any system libraries?
I ask because I am going to be running Julia on a cluster and I am concerned I may not be taking full advantage of the compute node architecture.

As an example, diagonalising a 4096 by 4096 Hermitian matrix using eigen! takes around 22-23 seconds on my computer (with 8 threads), but 15-16 seconds on a node of a cluster (which has 40 threads). I don’t expect a 5x speedup, but how do I ensure that the LAPACK that is being called really does know about 40 threads being present (and using them all)?

2 Likes

#2

Standard Julia builds use the version of LAPACK included with OpenBLAS, and build the latter with a limit of 16 threads. AFAICT one needs to edit deps/blas.mk in the Julia tree to get more. Be aware that this can make things worse for small and medium-sized problems; see discussion at this issue. (If your cluster uses Intel processors, consider building with MKL instead.)

0 Likes

#3

Thanks, Ralph! I’m trying to build with MKL, but facing several more issues relating to really old versions of gcc. I think I will just use the binary and see how it goes.

0 Likes

#4

Maybe this helps: MKL and libm compile notes: macOS

0 Likes