That’s great! Look forward to seeing it born!
Last time I tried, I was able to avoid this issue by running a very small computation sequentially on one node, before running the full parallel computation. The rationale was that if precompilation is performed sequentially by a single node, then there wouldn’t be conflicts between nodes at the beginning of the parallel run. Not sure this is guaranteed to work, but maybe worth trying?
@greatpet consider adding those tips to our guide, they are very helpful:
Yes, that’s something which worked here and there but I guess it boils down to how many invalidations are caused during the processing. Some jobs might trigger recompiliation… That being said, this workaround is not consistent enough, unfortunately.
Writing this line in the submission script doesn’t seem to have priority over that in the file ~/.bashrc
. I submitted this SLURM script when there was an environment variable written in the ~/.bashrc
file pointing to another version of Julia and it turned out that the Julia actually executing the code is not the julia-1.9.3 I specified in the SLURM script.
Do I also have to make sure that no other version of Julia is specified in the ~/.bashrc
file before I submit such a SLURM script?
Yes, the first entries in the PATH take precedence over the following ones. If you prepend the julia path before the existing contents of PATH
, then the version specified in the SLURM submission script will take precedence:
export PATH=/path/to/julia-1.9.3/bin:$PATH
Ah, that makes sense. Thanks a lot!
The way I use Julia on SLURM cluster is I manage installation with juliaup, then submit an interactive job like this:
$ salloc --x11 --time=1:00:00 --nodes=1 --tasks-per-node=126 --constrain=mil --qos=debug
This imports all environment from the calling shell on my cluster, so I don’t need to set up PATH etc. Then I call Julia normal way:
$ jullia
> using Distributed
> addprocs(126)
> @everywhere using MyPackage
> @everywhere include("my_script.jl")
This works. Submitting a job with sbatch
also works. Julia compiler knows when it needs to recompile packages (usually when I switch between Milan an Cascade Lake nodes, besides different architecture, they run different OS too). Compilation takes some time, so I try to run my jobs on same type of nodes.
P.S. If your environment variables are not propagated from login shell into SLURM session, I believe sbatch --export=ALL ...
or setting SLURM_EXPORT_ENV
variable may be usefull.
Thanks! I’m being curious about the interactive work mode of SLURM but not quite familiar with that for now.
@WuSiren For a simple interactive job
sbatch --nodes=1 --ntasks-per-node=1 -i - -pty /bin/bash
Now I recommend this guide:
https://juliahpc.github.io/
(In particular, item (4) in my old long answer in this thread is not considered ideal.)