Trying to debug a problem that may be due to
julia not using a provided
nodefile on a SLURM system. It appears that jobs are getting spun-off on hosts where it shouldn’t be.
The SLURM batch script is:
#!/bin/bash #SBATCH --email@example.com #SBATCH --mail-type=ALL # Alerts sent when job begins, ends, or aborts #SBATCH --ntasks=16 #SBATCH --ntasks-per-node=8 #SBATCH --mem=100G #SBATCH --job-name=indiv_array #SBATCH --array=1-5 #SBATCH --time=03-00:00:00 # Wall Clock time (dd-hh:mm:ss) [max of 14 days] #SBATCH --output=indiv_array_%A_%a.output # output and error messages go to this file export SLURM_NODEFILE=`generate_pbs_nodefile` julia --machinefile $SLURM_NODEFILE indiv_array.jl
Does anyone (a) see anything wrong with this, and (b) have any suggestions how I can check (from within the resulting Juila session) which nodes are actually being used to see if they correspond to the ones generated in
SLURM_NODEFILE? i.e. is there a function to report where
julia thinks it’s supposed to be running tasks I can print out?