Ask for official guide for Julia installation, deployment and use on Linux cluster

Is there any official guide for Julia installation, deployment and use on Linux cluster to someone who is new to both Julia and cluster like me?

Known: As far as I know, the cluster uses SLURM for task scheduling; I was able to submit and run python or matlab (which have pre-installed on the cluster) scripts with some simple sbatch commands.

Thanks!

2 Likes

Are you a cluster admin or a user?

You may be interested in

2 Likes

I’ve been using Julia on clusters, and here are some of my observations / recommendations.

(1) I often encounter old Julia installations on clusters, so I install the latest stable version myself.

(2) If you install julia via juliaup, it adds HOME/.juliaup/bin/julia to the PATH environment variable to your login shell by modifying e.g. $HOME/.bashrc. However, when you run a job through the cluster’s queuing system, the above environtment variable will not be set, and you need to manually add the above path to your job script, or alternatively directly use the julia binary executable, which juliaup places at e.g.

$HOME/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/bin/julia

Alternatively you could directly download Julia’s binary tarballs without juliaup.

(3) Starting from version 1.9. Julia caches compiled binaries, which can cause crashes on clusters where different nodes have different (variations of) CPU architectures. So I always set the environment variable
export JULIA_CPU_TARGET="generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)"
before running Julia, to make sure that the cached binaries are compatible with all the nodes that potentially run my job. The above line needs to be customized if you care about specifying the actual CPU architecture on your cluster to maximize performance. (See documentation here.)

(4) Some clusters have tight resource limits for login nodes, so I need to set the environment variable

export OMP_NUM_THREADS=1

to be able to launch julia successfully on login nodes. Additionally, the resource limits of login nodes can cause procompilation to fail when I add a package via Pkg, so I sometimes request an interactive compute job just to run the Pkg operations.

(5) If you want to set up a Julia installation for other users, it’s best to just provide Julia itself but not packages, though there are ways to provide a centralized package depot by using the environment variable JULIA_DEPOT_PATH. (The details are a little bit tricky and discussed elsewhere.)

(6) Set julia’s --heap-size-hint command line option, available since v1.9, to the amount of memory you requested in the queuing system. Otherwise Julia may think it has access to all the RAM of a node and not run GC aggressively enough.

P.S. I also hope that some kind of “official” guide is available, as I stumbled upon these issues in bits and pieces before finding out the solutions.

15 Likes

Thanks! I’m just one of the common users of the cluster. Is ClusterManagers.jl still suitable for me?

Haven’t used it yet, but I think it’s not essential for getting started. It’s useful if you want to automate queue submissions for many jobs.

1 Like

Thank you very much for your detailed response (although there is a lot of it that I can’t fully understand now)! :handshake: :handshake:

Yes I have downloaded Julia’s binary file and I can run Julia scripts with command julia myscript.jl under the path .../julia-1.9.3/bin/.

It seems that I just didn’t set up the environment variables correctly because I don’t quite understand how to add Julia to the global variables on the cluster. Even with the above code you provided, I have no idea what it means. :disappointed_relieved:

All I want is to be able to use Julia on a cluster just like Python. I guess this clusters seem to allow users to use their own installed software, but I don’t know how to install and configure it. I’ve read the documentation about this cluster, which has a short introduction to using module to manage software, but the content was sketchy and I didn’t understand it. Is that useful for me to configure Julia?

If Julia is not pre-installed on your cluster, you need to know how to run the programs you’ve installed yourself using absolute paths. Also, it helps if you know how to set environment variables in a shell script. I could walk you through some of the steps, but maybe it’ll all be clear if you read about these things.

1 Like

That’s a great start!

As a next step, I’d advise incrementally building a shell script automating the steps you need to take to run your Julia program. Such a script could initially look like this, saved as run.sh alongside myscript.jl:

#!/bin/bash

export PATH=$PATH:/path/to/julia-1.9.3/bin
export JULIA_CPU_TARGET="generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)"

julia myscript.jl

Begin with baby steps, putting almost nothing in myscript.jl, in order to check that you can at least run things correctly on your cluster.

You can test your shell script interactively on the login node (but make sure it does not try to consume too much resources):

shell> ./run.sh

And if that works, you can try submitting it as a SLURM job; the way to do this will depend on your cluster configuration, but it could look like:

shell> sbatch -n 1 run.sh

Using module could help you replace a bunch of environment settings in your submission script with a simple module load julia command. But I’d save this for a later stage (and only if you need it). For now I’d say you can live without it, and you have more important things to learn first.

1 Like

Yes, as you said, I realized that the problem stemmed from my own unfamiliarity with setting environment variables on Linux systems, and after some learning, the problem is now basically solved. Thank you very much! :handshake: :handshake:

Thank you very much for your kind help! :handshake: :handshake: :handshake:

I realized the problem might come from the setting of environment variables, so after learning I took the following measures to meet my elementary expectation:

  1. Add export PATH=$PATH:/path/to/julia-1.9.3/bin to the .bashrc file.
  2. Run command source .bashrc in the home directory.

Now I can submit the Julia script as a SLURM job and run it correctly.

But may I ask you what this line of code is for?

1 Like

And do you have any further suggestions on how to run Julia programs better on the cluster? Thanks!!!

But may I ask you what this line of code is for?

Julia compiles code for one or more CPU types. You have to set a CPU target that is general enough to cover all CPU types in your cluster, but specialized enough to achieve a good performance. If you are not doing this correctly your code might work on some CPUs of your cluster, but not on others…

1 Like

This might be due to the difference between interactive and non-interactive shells.
Also might be dependent on your particular batch system.

Sorry - I have had to dig into this quite deeply for Gridengine some time ago.

1 Like

Thanks! Can I also add this line of code to the .hashrc file? And where can I find some detailed documents for this feature of Julia?

That was already mentioned in this thread: System Image Building · The Julia Language

1 Like

Yes, you can. However, there are some subtleties related to interactive vs non-interactive shells. So while it’s possible to set up you environment with global files like .bashrc, I personally got into the habit of having self-contained job submission scripts that set up whatever environment variables the need. Another thing to keep in mind is that this makes it easier to collaborate with others: you only need to give them a set of files (as opposed to documenting the set of changes that one should implement in their .bashrc file)

I’d say the next big julia-related topic will be dependencies management. Does your script use any dependencies ? If so, are they declared in a project environment ? (i.e. did you run things like Pkg.add to populate a list of dependencies in some Project.toml file?)

If everything is well-organised from this viewpoint, then you’ll only need to specify the correct project when running julia (e.g. using the --project command-line switch)


Another non-julia-related topic to explore is how to tell SLURM what resources you want to reserve. There should be documentation for how to do this specifically on your cluster (to illustrate what I’m referring to, here is a random example I found on the Internet). In the objective of keeping a self-contained submission script that entirely documents how to submit jobs, I’d recommend using “SLURM directives”.

Your submission script might now look like this:

#!/bin/bash
#SBATCH --nodes=1
#SBTACH --ntasks-per-node=1
## other SLURM directives here as needed

# Environment variables
export PATH=$PATH:/path/to/julia-1.9.3/bin
export JULIA_CPU_TARGET="generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)"
## other env vars here as needed

# Run julia
# (activating the environment defined by the project in the current working directory)
julia --project myscript.jl

and you can submit the job simply with

shell> sbatch run.sh

(no need to add sbatch command-line arguments any more, they will be picked up from the directives in the submission script)

3 Likes

Oh! I will read that carefully! Thanks to you and @greatpet ! :handshake: :handshake:

One thing which is totally required at least on clusters I work on (e.g. Lyon CC, Erlangen RRZE) is the option

--compiled-modules=no

Without that, my jobs are crashing constantly due to some parallel compilation clashes. To my knowledge, the worker nodes have the same CPU models, but still, something is going wrong when multiple processes compile at the same time.

1 Like

I think this thread would be excellent written up as a HOWTO or something.

Ed: well get busy @johnh

1 Like

I really thank you for your step-by-step guidance! It’s very very helpful! :handshake: :handshake:

In fact, I am also consciously cultivating this habit myself.

Yes, I do use several dependencies in my script and now I think I know how to organize them in a project environment (In the directory of my scripts there are two files named Manifest.toml and Project.toml generated by Julia).

The effect now I expect is: the program can automatically activate the environment and load (or install) the necessary dependencies when I send the entire project directory to someone else or drop it into a cluster to run. To achieve this, I have now subjectively added the following two lines of code to the top of the script:

using Pkg: activate
activate(".")

Is this a professional way for doing so? Or is that what your suggest below for?

As for your last bit of advice on the use of SLURM, I think I’m currently doing just that, even though I’m not very familiar with the syntax of SLURM.

Thanks again for your great help! :smiley: :handshake: :handshake: