Julia on a cluster using SLURM, dependencies

I am learning to use Julia on a cluster. The scheduler is of type SLURM.

My question is:

In my file, I have ~10 packages, how do I insure these dependencies are installed so that the job executes successfully?

In my .jl file do I include the command

import Pkg

It somewhat depends on how your cluster is set up and how you have installed julia.

First things first, make sure you have a neatly set up project environment (governed by a Project.toml file, kinda reminiscent of a python virtual env, but a bit cleaner and more modern in julia). Launch julia with julia --project="folder_containing_projecttoml_file" to make sure that file is loaded.

As long there is some minimal level of consistency between the nodes you are using, and your storage is accessible from all nodes, this should be enough.

It might be a good idea to run an interactive slurm job the first time just to check everything is working properly. Start the interactive job, start julia with the appropriate project environment, go to the pkg mode, install the packages you care about, check that they import and do not raise weird errors related to the cluster environment. That way you also get to make sure that everything is precompiled once. Then you can simply always reuse that environment.

Some complications might arise if the nodes are drastically different, but that is rarely the case. We can discuss it if that issue arises.


Okay I’ll go a bit more in-depth here

So I have my bash script to start as something like

#SBATCH --time=00:60:00
#SBATCH --mail-user=user_email@email.com
#SBATCH --mail-type=ALL

module purge
module load julia/1.8.1

I have a scratch folder which is the storage used to compute

To create my project I use

pkg > generate MyProject

then I change my bash script to

#SBATCH --time=00:60:00
#SBATCH --mail-user=user_email@email.com
#SBATCH --mail-type=ALL

module purge
module load julia/1.8.1
julia --project=“folder_contraining_MyProject”

Then from here what do I do next? I need to use Pkg.instantiate()? And if so where do I call that method? Moreover, do I add the package names in the .toml file or I just then next call the file(s)?

Run this once with a script that installs the packages you want (or maybe even do it from an interactive slurm job so you can play around and double check everything). That would create a bunch of precompiled julia-related caches in the ~/.julia folder.

Then just run your julia work (with the same project file). The packages will be available because you are using the same project file.

If you want to be extremely careful, you can even print Pkg.status() and InteractiveUtils.versioninfo() at the start of your jobs so you have proof in your log files that everything you cared about is indeed pre-installed.

If you want to run the script script.jl, just add it to the end of your last line, i.e. julia --project="..." script.jl

EDIT: If you want to, you can edit the Project.toml file by hand (but NOT the Manifest file). You will need to use the instantiate command if you edit it by hand. But just using Pkg.add would perform the edits for you as well and it probably more convenient. You can press the ] button to enter the pkg management mode.

1 Like

Also, just making sure: you have tried this out on your local computer first, right? That is of course not the same, but I believe it is useful to check that you know how to install packages in general, etc, before adding the complications of using SLURM.

1 Like

Yeah I did on my local machine, thank you for your help

If I use

#SBATCH --time=00:60:00
#SBATCH –mail-user=user_email@email.com
#SBATCH --mail-type=ALL

module purge
module load julia/1.8.1
julia --project=“folder_contraining_MyProject” script.jl

Where script.jl only contains a call as follows

open(“./testing_write_proj.txt”, “w”) do file
write(file, “This worked”)

Then it is fine, however if I include the following in script.jl

using Package-Name

then the SLURM job doesn’t execute. Do I need to activate the project somehow or ? If I open Project.toml the package is listed so seems I installed it correctly within the project environment.

Even in an interactive SLURM I cannot seem to get the file to execute with the ‘using Package-Name’ inclusion

How did you install the packages to begin with?

If you specify the project flag, you should not need to activate anything explicitly.

Start an interactive job again and do some debugging. Try to extract information about your system, to verify the assumptions you have (e.g. the assumption that something has already been installed).

For instance, in the interactive job do versioninfo() and then do ] status to check whether you are indeed using the version you think you are using, the project file you think you are using, and the already installed libraries you believe you have installed. Run these on your local system too, to see what you can expect them to be on a working system.

For the moment it sounds the packages were not installed.

EDIT: also enable logs of stdout and stderr in SLURM so you can see the error messages telling you why julia did not execute.

Okay I followed your reply, when I check status I do indeed have the packages installed and so I tried running the script again and it just does not execute.

I used your suggestion of writing a script to install.

Do I need my script to call the project module?

That is do I need it to be of the form

module TestProject
greet() = print(“Hello World!”)
using Statistics
using Distributions
open(“./testing_write_proj.txt”, “w”) do file
write(file, “worked 1:03 edit”)
end # module TestProject

Given that running the interactive session shows everything being fine, I suspect your setup is ok.
You do not need to package your script as a module (but generally, using modules is a good way to organize things). Could you show your slurm log files? Maybe put this in your slurm batch file:

echo "before julia"
julia --project=... -e "using Pkg; Pkg.status()"
echo "after julia"

What do the log files say after this?

Have you checked that that same script (with a similar project file) behaves appropriately on your local machine?

1 Like

I got it working, turns out I ultimately needed to request more memory per cpu. Thanks for the help, was enlightening

1 Like