I have a julia script that run a physics simulation. The physics simulation takes around 8 hours, and I want to run it with many random initial conditions, so I planned to run the script multiple times using a SLURM job array, using ArgParse.jl to process command line arguments which determines the parameters of the simulation, including the seed to use for the randomization (which is what the job array ID controls). My script does not use multithreading or distributed computing in any way, I just want to run it many time with different initial conditions
Unfortunately, I am somehow getting a segmentation fault from ArgParse.jl
Here is a MWE.
argparse.jl
:
using ArgParse
s = ArgParseSettings()
#add_arg_table(s, "arg1", Dict(:nargs=>1, :required=>true))
@add_arg_table s begin
"arg1"
help = "First argument"
required = true
end
p = parse_args(s)
println("Argument 1 is: ", p["arg1"])
argparse.sb
:
#!/bin/bash --login
#SBATCH --job-name=argparse # Job name
#SBATCH --mail-type=NONE # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --nodes=1 # Maximum number of nodes to be allocated
#SBATCH --ntasks-per-node=1 # Maximum number of tasks on each node
#SBATCH --cpus-per-task=1 # Number of processors for each task (want several because the BLAS is multithreaded, even though my Julia code is not)
#SBATCH --mem=2G # Memory (i.e. RAM) per NODE
#SBATCH --export=ALL
#SBATCH --constraint=intel18
#SBATCH --time=0-00:05:00 # Wall time limit (days-hrs:min:sec)
#SBATCH --output=argparse_%A.log # Path to the standard output and error files relative to the working directory
echo "Date = $(date)"
echo "Hostname = $(hostname -s)"
echo "Number of Nodes Allocated = $SLURM_JOB_NUM_NODES"
echo "Number of Tasks Per Node = $SLURM_NTASKS_PER_NODE"
echo "Number of CPUs Per Task = $SLURM_CPUS_PER_TASK"
echo ""
which julia
julia ./argparse.jl 1
The log generated by running sbatch argparse.sb
:
Date = Fri Nov 29 05:59:40 PM EST 2024
Hostname = skl-027
Number of Nodes Allocated = 1
Number of Tasks Per Node = 1
Number of CPUs Per Task = 1
/mnt/home/leespen1/.juliaup/bin/julia
/var/lib/slurmd/job47051755/slurm_script: line 22: 1426934 Segmentation fault (core dumped) julia ./argparse.jl 1
I have no idea where to go from here. Any advice on how to fix this (or an alternative workflow which would accomplish the same thing) would be appreciated. All the material I have found for using Julia in HPC have been about how to use multi-threading or distributed computing, which is not what I am trying to do.
PS, the segfault does not happen when I run argparse.sb
as a bash script using salloc
:
eespen1@dev-intel18:~/Research/QuantumGateDesign.jl/cnot3$ salloc --nodes=1 --ntasks=1 --mem=2G --cpus-per-task=1 --constraint=intel18 --time=00:05:00
salloc: Granted job allocation 47051784
salloc: Waiting for resource configuration
salloc: Nodes skl-031 are ready for job
leespen1@skl-031:~/Research/QuantumGateDesign.jl/cnot3$ bash argparse.sb
Date = Fri Nov 29 06:11:15 PM EST 2024
Hostname = skl-031
Number of Nodes Allocated = 1
Number of Tasks Per Node =
Number of CPUs Per Task = 1
/mnt/home/leespen1/.juliaup/bin/julia
Argument 1 is: 1