Installation on managed cluster

I install scientific software on a big supercomputer cluster, and we have occasional requests for Julia. So I thought I’d make an installation that contains a bunch of popular packages such as HDF5 and MPI.

  • Installing from source was trivial. Kudos!
  • Setting JULIA_DEPOT_PATH I was able to get some packages to appear in a system location, rather than my home directory.
  • However, telling a user about my executable and value of depot path makes them try to write things like log files in that system location.

So, as a complete non-user of Julia, meaning that you have to spell things out for me, I could use some help in how to set up a central installation.

1 Like

I think that users won’t benefit a lot from having local package installations. Julia will compile the code for each user anyway and store it in the user’s home directory (at the .julia dir).

It could be useful to have a local mirror of the general registry, but that’s only worthwhile if the user access to external sites is slow or cumbersome for any reason.

(even having a centralized installation of Julia itself is probably not very useful. Just downloading the binaries or managing Julia with juliaup is simple enough so that users shouldn’t have many issues. I’m guessing the cluster has some Linux flavor installed)

I’m going to agree with @lmiq here. I work with Julia on a cluster. Our cluster uses EasyBuild to configure software installations. I have had no issues using EasyBuild to install Julia in my home directory – it configures everything correctly. Once I load it with module load Julia/specific-version-name it works just as it does on my local computer. The bonus of this is that I can upgrade as soon as I want (I installed 1.9.1 this morning).

I see your point. But.

“Just downloading the binaries”

Right. This whole issue started when a user observed that the downloaded binary was factors slower than expected. My install from source was much faster, presumaby because the compiler could find a better instruction set and do other observations and whatnot.

Never use the word “just” around me. Please.

Does that include the MPI module finding the right network? We ban, eh, sternly chide, users for running parallel jobs through the generic ethernet network rather than our expensive high speed network. If you tell me that “add MPI” finds the right network, then I’m happy.

(So that was going to be my next question: how to tell the MPI module to find the right network.)

The docs for the Julia MPI interface have instructions for setting things up in cases like yours here There is a section specific to the cluster case.

I guess I was too loose with my language when I said everything haha. I don’t use things like MPI (I just use a bigger node!).

Cool. But it doesn’t look like I can set that up for users, right? I’ll have to make very clear instructions for anyone wanting to install Julia.

No, if you follow the instructions further down on that page, your users shouldn’t need to do anything besides add MPI to access whatever system-specific MPI binary you’ve specified.

I recommend looking into how JLL packages work and how to override them.

https://docs.binarybuilder.org/stable/jll/#Overriding-the-artifacts-in-JLL-packages

Also see the docs on hoe deoots stack and code loading

https://docs.julialang.org/en/v1/base/constants/#Base.DEPOT_PATH

Wow @VictorEijkhout has joined the Julia discourse! Victor, I have your book Introduction to High Performance Scientific Computing on my bookshelf.

Victors books are here: http://theartofhpc.com/

7 Likes

This here might be helpful: GitHub - hlrs-tasc/julia-on-hpc-systems: Information on how to set up Julia on HPC systems

Note that it is a bit outdated though.

I can’t share our setup right now because I’m on my phone. Feel free to ping me again if you haven’t heard from me later this week.

cc @sloede

What was slow specifically? The Julia distribution is mainly a compiler + a runtime, unless you’re hitting a specific workflow where the runtime matters a lot (for example the garbage collector), the common knowledge is that what code the compiler generates shouldn’t depend much (if at all) on how the compiler itself was compiled, i.e. targeting the generic base architecture or the exact ISA of the current CPU.

You’re on the wrong level. I think the observation was that the compiler itself was slow. For instance in starting up the Julia executable to begin with.

So what you observed to be slow wasn’t the code generated by the compiler, that’s why I asked what was slow :slightly_smiling_face:

1 Like

I should imagine that @VictorEijkhout knows this very well… often on HPC systems the users home directory is on a slw and space limited NFS filesystem. The working storage is on faster parallel storage with high capacity. Is this a factor here?

Combination of space considerations and that I like to provide optimized MPI, maybe HDF5, other packages where the context makes a difference.

For all practical intents, in our experience the location of the Julia depot is usually not a performance-critical factor before you scale to >1000 MPI ranks (at least since Julia v1.7).

@VictorEijkhout You should be able to use centrally managed Preferences.jl and/or environment variables to configure the correct settings for MPI.jl and HDF5.jl. For this, you need to put a centrally managed LocalPreferences.jl file in the users’ JULIA_LOAD_PATH (e.g., by setting the JULIA_LOAD_PATH appropriately in a module file). In case of environment variables, you can just set them directly inside, e.g., the Julia module. The contents of the preferences file or the relevant environment variables can be found in the MPI.jl/HDF5.jl docs (let me know if you have trouble finding the right place).

For our purposes, we have found that maintaining centrally managed package installations in a shared depot is not worth the hassle. Even on systems where users cannot access the internet directly, there are ways around that (see, e.g., this topic).

1 Like

Thanks. That was fairly clear. Eh,

[staff julia:132] cat ${juliaroot}/depot/environments/v1.10/LocalPreferences.toml
[MPIPreferences]
_format = "1.0"
abi = "MPICH"
binary = "system"
libmpi = "libmpi"
mpiexec = "mpiexec"

I had expected to see a path to the libmpi and a trailing .so. Should I edit that? Is there a way to run ldd on a Julia executable to see where it gets its MPI? (Yes, I know. Mutatis mutandis.)

The value of libmpi is whatever you can pass to dlopen to find the library. The basename without extension is fine for the dynamic loader, and it’s generally preferable because it’s platform-independent. Of course this doesn’t matter much when you’re doing system-specific configuration.

Not really, Julia dynamically loads shared libraries like plugins with dlopen. After you have installed the MPI.jl package, you can see the full path of libmpi with

using MPI
MPI.API.libmpi

But if you want to be extra sure of what library to use, you can also use the absolute path of your libmpi as libmpi value, that also works for dlopen.