Hello,
I have a question about how BLAS threads are specified. For context, I am on my University’s HPC in an interactive session on our shared memory cluster. The partition that I am using has nodes which consist of two CPUs each with 12 cores (so 24 cores per node). I am using an interactive session because I want to do some profiling of my code using the VsCode Julia extension (using @profview
and @profview_allocs
). I could do this on my home PC but I have more cores to play with on the cluster.
I understand that Julia threads and BLAS threads are separate. I have code that performs many repeated operations with dense matrices and I’d like to see the impact of increasing the thread count for BLAS. I am using MKL.jl
on intel CPUs.
To do this I request an interactive session with 24 cores (so one entire node on our shared memory cluster).
I set the following in my VsCode settings.json
.
{
"terminal.integrated.env.linux": {
"MKL_NUM_THREADS": "24"
}
}
I also do export MKL_NUM_THREADS=24
in the VsCode integrated terminal.
Then I go to my Julia file and do a Shift + Enter
on the first line to start up Julia in the VsCode integrated terminal.
When I check though, I can only get 12 threads. Even if I manually set the count higher.
Threads.nthreads() # 1
BLAS.get_num_threads() # 12
BLAS.set_num_threads(24)
BLAS.get_num_threads() # Still 12
Does anyone know why? Or if I am doing a step wrong?