My preferred way is something like this:
#!/usr/bin/env sh
#SBATCH -N 10
#SBATCH -n 8
#SBATCH -o %x-%j.out
#=
srun julia $(scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}')
exit
# =#
using MPIClusterManagers
MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)
println(workers()) # should have 80 workers here across 10 nodes (controlled by -n and -N above)
You put this in myscript.jl
and then sbatch myscript.jl
.
This is using A neat Julia/SLURM trick and MPIClusterManagers.jl.
ClusterMangers’s ElasticManager
is also quite useful for dynamically hooking up workers to e.g. a Jupyter session, if you prefer the interactive workflow.