Change the default global environment path (Julia on HPC)

A little bit of background:

I am currently working on a solution to set up Julia for grid computing where heterogeneous clusters are used (I am aware of GitHub - JuliaParallel/JUHPC: HPC setup for juliaup, julia and HPC key packages requiring system libraries but we think we need a different approach).

Currently, I am doing the following steps:

  1. export JULIA_CPU_TARGET=generic;sandybridge,clone_all;icelake-server,clone_all;znver2,clone_all;haswell,clone_all;broadwell,clone_all;znver1,clone_all;skylake-avx512,clone_all;znver3,clone_all;cascadelake,clone_all to allow for compilation for all kinds of CPUs we use around the world
  2. Download Julia (1.11, which is needed to be able to relocate compile artifacts) on local computer, using that: create an environment called @baseenv and install a bunch of commonly used packages to trigger a good amount of precompilation
  3. Repeat with a few additional environments, e.g. teaching or specific analysis scenarios with some other sets of packages.
  4. tar the whole environment and upload it to CVMFS (distributed file server which syncs across many clusters). mtime is hereby not touched (important to avoid unnecessary precompilations)
  5. Set up a script to activate the environment using a custom (writable) folder for the depot where users install additional packages (or different version) and another depot which points to the one used to create the artifacts in step 2. and 3. (on CVMFS)

Here is the activation script:

JULIA_DIR=/cvmfs/km3net.egi.eu/julia/x86_64/1.11.1
export PATH=${JULIA_DIR}/bin:$PATH

# Try to use already available packages in the cache and not the latest
# and greatest, to decrease traffic and additional compilation.
export JULIA_PKG_PRESERVE_TIERED_INSTALLED=true

# The first entry in JULIA_DEPOT_PATH is the one which is
# writable for the user. All others are read-only and will
# also be searched for precompiled code before creating
# new ones in the first path.
# Here, we set an appropriate directory for each cluster
if [[ "$(hostname -d)" == *in2p3.fr ]]; then
    export JULIA_WRITABLE_DEPOT_PATH="/sps/km3net/users/$USER/.julia"
else
    export JULIA_WRITABLE_DEPOT_PATH="$HOME/.julia"
fi

JULIA_CVMFS_READONLY_DEPOT_PATH=${JULIA_DIR}/depot
export JULIA_DEPOT_PATH=${JULIA_WRITABLE_DEPOT_PATH}:${JULIA_CVMFS_READONLY_DEPOT_PATH}

I encounter two big problems:

  1. The default environment is pointing to a read-only directory: /cvmfs/km3net.egi.eu/julia/x86_64/1.11.1/depot/environments/v1.11/Project.toml, so nobody can install anything in it

I tried to set the JULIA_LOAD_PATH manually, but if doing so and pointing it to an empty folder, there is no active environment when starting Julia

export JULIA_LOAD_PATH=/sps/km3net/users/tgal/.julia/depot/environments
pkg> st
ERROR: no active project

pkg> activate

pkg> add Revise
ERROR: no active project

Question: how do I set the default global environment to a specific path? It would be great if it would be in the depot folder which is writable to the user.

  1. I still see a lot of compilation when using packages which were already precompiled before. I guess this needs more bisection and I am not sure if it’s caused because I used 3 different environments when creating the read-only artifacts, where some package (and dependencies) are contained in more than one environment, so my guess is that precompilations for environments are not isolated. Meaning; if e.g. PackageA is in two different environments and for some reason, its dependencies in the two environments are resolved differently, the precompilation of PackageA in one environment might be invalidated in another, or are they isolated?
1 Like

I think I found a work(around|ing solution):

I added this to the activation script which sets JULIA_PROJECT to an environment called v1.11, so for the user it looks and feels like it’s the usual global environment. It also needs to be prepended to JULIA_LOAD_PATH so that it stacks.
I also deleted the v1.11 environment from the distributed folder, otherwise there will be two @v1.11 entries in LOAD_PATH in the Julia runtim.

# JULIA_PROJECT is set to a writable directory in the users depot
# to replace the default global environment, which is read-only
# The path is prepended to the JULIA_LOAD_PATH variable to mimic
# the loading preference of packages.
export JULIA_PROJECT=${JULIA_WRITABLE_DEPOT_PATH}/environments/v1.11
mkdir -p ${JULIA_PROJECT}
export JULIA_LOAD_PATH=${JULIA_PROJECT}:${JULIA_LOAD_PATH}
1 Like

I’m missing why you’re mixing up the load path and the environment.

1 Like

OK let me try to summarise what I want to achieve:

  • Provide Julia in a central place (CVMFS as a network mount for HPC centres around the world) for grid computing with a good amount of commonly used packages already precompiled to save time and reduce unnecessary computation, all in all to make the Julia experience snappier, especially for newcomers
  • After sourcing an activation script, the user should have their own Julia environment, their own global environment which is clean and a given place to store additional/other packages, based on a central configuration (different paths on different clusters in future)

I am using some “global” environments in the commissioning phase just to trigger preinstallations and also make them visible for users e.g. to solidify a set of environments for common tasks.

I figured that “mixing up the load path and the environment” basically emulates the behaviour of a standard user installation of Julia, but maybe that’s the wrong approach?

Disclaimer: I haven’t read your opening post in detail.

It’s fine to provide “global” pre-defined environments as opt-in, if you think that’s helpful. But I would generally advise you to not modify the user’s JULIA_LOAD_PATH (except for setting global Preferences.jl). For the reasons mentioned here.

For the more general case, trying to provide a shared Julia depot (through stacking via JULIA_DEPOT_PATH) is fine. The tricky part, however, is to precompile the correct versions of a set of selected packages (and their dependencies). This is especially tricky because you need to keep up with the speed of new releases of these packages (and their dependencies), because you don’t control which versions the user / the resolver will choose (it might help to use the tiered install option to mitigate this a bit). For this reason, it will either require quite a bit of continuous work - that HPC admins likely won’t help you with - or at least good automatisation.

(BTW, @JamesNZ asked a very similar question on Slack the other day.)

This is surprising to me. It should point to the writable depot (the first in the stack). Try deleting or renaming the environments folder in the read-only depot.

I agree with you of course, but as I wrote, the problem is that Julia tries to write to the global read-only environment. The problem with removing the environments/ folder (as you suggested above) is that it also removes the other global environments, but that’s not a huge problem since I can also put them in another directory. I feel like environments/ is reserved and part of the logic, so if it’s there, it will be preferred. That’s at least how I interpret your suggestion and the quote "suprising to me" :wink:

Yes I know but if it already set’s up our own private/internal registries, that’s a bonus and I have some Ansible playbooks to keep things more or less up-to-date, it’s not a huge problem with outdated packages though :slight_smile: