I am excited to start using Julia 1.10.0 since I have a particular set of simulations which are allocation heavy (many small allocations) and I believe the GC is the current bottleneck.
I want to make sure I am doing this right. I think to start, it would make sense to split total threads available between those used for GC and those for computing. In that case I am running the following.
My run script contains the following lines (summarizing)
# runscript.slurm
#SBATCH --ntasks-per-node=12
GC_THREADS=6
COMPUTE_THREADS=6
main="my_simulation.jl"
export OMP_NUM_THREADS=$COMPUTE_THREADS
julia --project=@. --gcthreads=$GC_THREADS $main
The idea here is that I don’t want the GC threads to compete with the BLAS threads right? So it wouldn’t make sense to have both be set to 12 (total threads in this example).
Also - I can double check the number of BLAS threads by doing BLAS.get_num_threads()
. Is there any similar function for checking number of GC threads?
Try this:
--gcthreads=6,1
For me it gives a significant improvement.
And I don’t think BLAS threads compete with GC threads because the GC stops everything else while running, but I might be wrong.
Threads.ngcthreads()
(although not public API)
Note that, while this works (because we explicitly check for it), you should rather set OPENBLAS_NUM_THREADS
because Julia isn’t using OpenBLAS with OpenMP threads but pthreads.
Thanks!
I was using OMP_NUM_THREADS
because on my cluster we have intel and AMD CPUs and I have the run script built so that if the architecture is AMD it won’t use MKL
(defaults to OpenBLAS). Since I could possibly use either OpenBLAS or MKL depending on how slurm dispatches the run, I thought I should use OMP_NUM_THREADS
. Sorry if that’s completely wrong or a bad practice (was just my naive first attempt). If you have any recommendation on how to do that better please let me know.
I suppose I could constrain slurm to only use intel CPUs and then use MKL_NUM_THREADS
?
Good to know, I will try that today.
Regarding whether the threads compete, that would also be good to know.
I cannot answer this question, but you should know that MKL also works nicely on AMD CPUs… Just benchmark yourself if OpenBLAS or MKL is better for your use case…
Running MTK simulations multi-threaded, that only works with MKL…
Hm - yes I suppose I will have to then. I thought I had read somewhere that Intel made it run worse if it detected an AMD CPU.