I am trying to run MCMC sampling for a model built with Turing.jl. Since different chains can be sampled in parallel using multiple processes, I want to use the following line:
# some lines in my code, model_sim.jl
sample_num = 1000
chain_num = 3
chain = sample(test_model, NUTS(0.65) ,MCMCDistributed(), sample_num , chain_num )
Since I want to do this for a lot of models that only differ in a few parameter values, I thought about doing this with job arrays on the cluster, with each model as a task, and with 3 CPUs assigned to each task.
#SBATCH --job-name=array-job # create a short name for the job
#SBATCH --output=slurm-%A.%a.out # stdout file
#SBATCH --error=slurm-%A.%a.err # stderr file
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=3 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --array=1-10%5 # job array with index values 1 to 10, but max 5 at once
Although the jobs are parallelized, the code is recompiled each time, thus greatly reducing the benefit of parallelization. Is there a way to avoid recompiling on each new job in the job array aside from pre-compiling my code into a relocatable app?
Use a combination of
ClusterManagers. jl. The way I organize my code is like following:
# long simulation code
addprocs(SlurmManager(N) ...) # add other options if needed
results = pmap(run_simuation, 1:N) # where N is the number of sims.
The first time
run_simuation is run on each core, it will compile. The next time it runs on the same core, it will use the compiled version.
Another benefit of this is that the results of your simulations are simply stored as an array in
results. This makes post-processing much easier (and actually you can make the post processing parallel as well since you have the worker processes already launched).
Thank you for sharing your workflow! I think this would be optimal if I am simulating N independent chains.
run_simulation function randomly samples parameter values and runs 3 chains, what’s a good way to build the dependency of these three chains into the pmap function? Is my only option to sample N parameter values, duplicate them 3 times, once for each chain, and store them for
run_simulation to access later prior to running
pmap(run_simuation, 1:N) ?
Maybe you want to move the
pmap around and break up your
run_simulation to something like
for i = 1:3
sample_value = rand()
pmap(x -> run_chain(sample_value), 1:N) # runs N simulations on m cores
# complicated, long work here
Does that help? I have a feeling that this is not what you are looking for.
Not quite, thank you though!
What I’m looking for is closer to the following:
for model_i in 1:N
sample_value = rand()
for chain_i in 1:3
# check if the 3 independent chains mixed well and other stuff
Would be great if both the chain loop and the model loop can be parallelized somehow. I think pmap can’t be nested so that’s the main challenge for me at the moment.
I usually use
# chains is n by 3
# can check if the 3 chains in each row are the same in parallel too, up to you
This really helps for doing a grid search of parameter, just like you would with a job array. Hopefully, this is helpful.
Also, you can consider creating a system image (Compiling Sysimages · Julia in VS Code and Home · PackageCompiler) for your code, so that it will have already precompiled before running the job array.
thanks for the tip, passing in seeds as a solution is unexpected but really helpful. Regarding sysimage, I remember that the packageCompiler doc mentioned that sysimage from one machine can’t be used on another? So would I need to create a system image on the cluster? Or can I make one on my computer and somehow configure it before passing it to the job array on the cluster?
Great! Glad to have helped.
I haven’t used the sysimage on my HPC at all, but since most clusters are designed to have the same architecture and have a shared filesystem, making it on one node (with something like an interactive slurm session) should be fine. When you connect to the other workers with
addprocs there is a kwarg that is
exeflags which may let you load a sysimage. This guide will hopefully have everything you need.
I may try it myself and see how it goes, as I imagine it would be very helpful.