Recommended env variables for most performant parallel/multithreading?

Hi all, I have two questions.

  1. What are the recommended values of the env variables (JULIA_NUM_THREADS, MKL/OPENBLAS_NUM_THREADS, …) if I want to speed up my code using Threads.@threads?

  2. To what should I set those variables (the former ones plus addprocs(x)) if, instead, I would like to use @distributed or pmap to get more performance?

The heaviest parts of my scripts usually involve diagonalizing very large Hermitian matrices and/or BLAS operations. Also (3.), is there any possibility to mix multithreading code within parallel loops and still improve performance, or should I avoid using multithreading when I am already using parallel and vice-verse?

For more information, the Julia version I am using is

>>> versioninfo()

Julia Version 1.1.0
Commit 80516ca202* (2019-01-21 21:24 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Searching for W-2145 CPU @ 3.70GHz I reached this page, where it says I have 8 cores and 16 threads. Should the answer for 1. be JULIA_NUM_THREADS=16 and the answer for 2. be addprocs(8)? What about MKL/OPENBLAS_NUM_THREADS?

1 Like

You’ll want the total number of threads in use to be 16, so for example addprocs(8) and JULIA_NUM_THREADS=2 (this is the number of threads used by @threads within a single process). I would also set the number of BLAS threads to the same value as JULIA_NUM_THREADS. However, if you call BLAS within a @threads loop the CPU will probably be over-subscribed, so I would try to avoid that.

EDIT: If you happen to have a case where it makes sense to call BLAS inside a @threads loop, then you should set the number of BLAS threads to 1.

Thanks! I didn’t know that JULIA_NUM_THREADS was the number of threads used within a single process.

With BLAS threads do you mean that I should set both BLAS.set_num_threads() and MKL_NUM_THREADS to be equivalent to JULIA_NUM_THREADS, or that BLAS.set_num_threads(Threads.nthreads()) works for every BLAS and MKL/OpenBLAS operation?

One more question, if I only want to use multithreading and not parallel, I should set JULIA_NUM_THREADS=16, am I right?

I believe it should, yes.