I’m running some machine learning code on a linux HPC cluster that is managed by a PBS scheduler. The code consists mostly of training a bunch of random forests using MLJ.jl and DecisionTree.jl. DecisionTree.jl uses multithreading to train random forests. However, I’m not confident that the multithreading is happening when I run my code on the cluster. I ran some jobs that seem to be taking a lot longer than they took on my laptop (where the multithreading seems to work as expected).
I don’t really know how PBS works under the hood, but I would imagine that it provides some kind of virtual machine for your code to run on. So I would hope that multithreading works about the same as it would on my laptop.
So far the only test I’ve come up with is the following. I created the following test PBS script:
#!/bin/bash -l export JULIA_NUM_THREADS=12 julia test.jl
Where “test.jl” contains the following:
println("Number of Julia threads: $(Threads.nthreads())")
Then I submit the PBS job with this:
qsub -l nodes=1:ppn=16,mem=2gb,walltime=01:00:00 -q mangi test.pbs
And the output of the job is this:
Number of Julia threads: 12
JULIA_NUM_THREADS in the PBS script seems to work as expected. Besides setting
export JULIA_NUM_THREADS=12 in the PBS script, is there anything else I need to do to ensure that multithreading will work correctly on a PBS controlled HPC cluster?