Question about Setting BLAS Threads on Cluster


I have a question about how BLAS threads are specified. For context, I am on my University’s HPC in an interactive session on our shared memory cluster. The partition that I am using has nodes which consist of two CPUs each with 12 cores (so 24 cores per node). I am using an interactive session because I want to do some profiling of my code using the VsCode Julia extension (using @profview and @profview_allocs). I could do this on my home PC but I have more cores to play with on the cluster.

I understand that Julia threads and BLAS threads are separate. I have code that performs many repeated operations with dense matrices and I’d like to see the impact of increasing the thread count for BLAS. I am using MKL.jl on intel CPUs.

To do this I request an interactive session with 24 cores (so one entire node on our shared memory cluster).

I set the following in my VsCode settings.json.

    "terminal.integrated.env.linux": {
        "MKL_NUM_THREADS": "24"

I also do export MKL_NUM_THREADS=24 in the VsCode integrated terminal.

Then I go to my Julia file and do a Shift + Enter on the first line to start up Julia in the VsCode integrated terminal.

When I check though, I can only get 12 threads. Even if I manually set the count higher.

Threads.nthreads() # 1
BLAS.get_num_threads() # 12
BLAS.get_num_threads() # Still 12

Does anyone know why? Or if I am doing a step wrong?

1 Like