I have a simple script to use in slurm and I’m having trouble understanding the output:
#MPItest.jl
using MPI
println("Testing MPI")
That I run on slurm with the following sbatch script:
#!/bin/bash
#SBATCH -p free
#SBATCH --job-name=SRRS
#SBATCH --nodelist=hpc3-20-31
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1 ## number of cores the job needs
#SBATCH --constraint="intel&mlx5_ib" ## run only on nodes with updated IB firmware
#SBATCH --error=err-%j.log
#SBATCH --output=output-%j.log
#SBATCH --mem-per-cpu=8G
#SBATCH --mail-type=begin
#SBATCH --mail-type=end # send email if job fails
#SBATCH --mail-type=fail # send email if job fails
#SBATCH --mail-user=ernestob@uci.edu
echo "Number of Nodes: $SLURM_JOB_NUM_NODES" >> "julia-$SLURM_JOB_ID.out"
echo "Node List: $SLURM_JOB_NODELIST" >> "julia-$SLURM_JOB_ID.out"
echo "Number of ntasks: $SLURM_NTASKS" >> "julia-$SLURM_JOB_ID.out"
echo "Number of ntasks per node: $SLURM_NTASKS_PER_NODE" >> "julia-$SLURM_JOB_ID.out"
echo "Cpus per task: $SLURM_CPUS_PER_TASK" >> "julia-$SLURM_JOB_ID.out"
module load openmpi/4.1.2/gcc.11.2.0
module load hdf5/1.14.1/gcc.11.2.0-openmpi.4.1.2
# set these UCX parameters for openmpi
export OMP_NUM_THREADS=1
export UCX_TLS=rc,mm
export UCX_NET_DEVICES=mlx5_0:1
export UCX_ERROR_SIGNALS="SIGILL,SIGBUS,SIGFPE"
export JULIA_LOAD_PATH="$JULIA:/data/homezvol0/ernestob/.julia/environments/v1.10"
export JULIA_CPU_TARGET="generic;skylake-avx512,clone_all; skylake,clone_all;icelake-server,clone_all;"
export JULIA_MPI_LIBRARY="/opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi"
export ZES_ENABLE_SYSMAN=1
echo "Precompiling Master" >> "julia-$SLURM_JOB_ID.out"
$HOME/.juliaup/bin/julia --project=. -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()' >> "julia-$SLURM_JOB_ID.out"
echo "Starting Julia Test" >> "julia-$SLURM_JOB_ID.out"
mpirun -np $SLURM_NTASKS -mca pml ucx -mca btl ^uct -x UCX_NET_DEVICES=mlx5_0:1 $HOME/.juliaup/bin/julia --project=. --threads 1 /dfs6/pub/usr/Julia/MPItest.jl >> "julia-$SLURM_JOB_ID.out"
Which outputs:
Number of Nodes: 2
Node List: hpc3-20-[23,31]
Number of ntasks: 4
Number of ntasks per node:
Cpus per task: 1
Precompiling Master
Starting Julia Test
e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia Testing MPI
Testing MPI
Testing MPI
Testing MPI
What is: e]0;Julia
? Why is it printing out 2x the ntasks? No matter how much I change the ntasks it’s always double.