LAPACK multithreading


Hello everyone,
I would like to know how the binaries on the Julia downloads page page are generated. For example, what are the default compilation flags used? Would I get the same result if I compiled them on some (Linux) system, assuming I do not link to any system libraries?
I ask because I am going to be running Julia on a cluster and I am concerned I may not be taking full advantage of the compute node architecture.

As an example, diagonalising a 4096 by 4096 Hermitian matrix using eigen! takes around 22-23 seconds on my computer (with 8 threads), but 15-16 seconds on a node of a cluster (which has 40 threads). I don’t expect a 5x speedup, but how do I ensure that the LAPACK that is being called really does know about 40 threads being present (and using them all)?



Standard Julia builds use the version of LAPACK included with OpenBLAS, and build the latter with a limit of 16 threads. AFAICT one needs to edit deps/ in the Julia tree to get more. Be aware that this can make things worse for small and medium-sized problems; see discussion at this issue. (If your cluster uses Intel processors, consider building with MKL instead.)



Thanks, Ralph! I’m trying to build with MKL, but facing several more issues relating to really old versions of gcc. I think I will just use the binary and see how it goes.



Maybe this helps: MKL and libm compile notes: macOS