Single folder shared by different machines

Farcitoast · October 12, 2023, 7:56pm

Hi everyone, I’m quite new to Julia, and I’m having trouble finding the best way to have a shared folder between several machines that might run some Julia scripts in parallel.

My typical workflow is the following:

I log into a server where I have a personal home folder ~ with Julia 1.9.3 installed there. Suppose I have a long-running script test.jl in ~.
From the server I have access to a cluster of machines (with different hardware) that have shared access to my folder ~. I can submit my script test.jl to the cluster so that it will be run on one of its machines (it could be any of them). I also need to be able to submit test.jl (or other scripts) to multiple machines in parallel.
Note that I have no control over when the scripts will actually be executed once I submit them, the scheduler of the cluster will decide. This means that if I submit two different scripts on the cluster, it might be the case that they will be executed at the same time on different machines (that share the ~ folder with the Julia installation).

My question is how to simply/cleanly handle this scenario so not to have package problems, or precompiled caches changed at the same time by multiple machines for different architectures, etc.
I read a lot about environments and the JULIA_DEPOT_PATH variable, but I still haven’t found a clear solution to my problem.

Every script I have starts with import Pkg and Pkg.add for every package, followed by Using for every package. I thought I should have been able to omit the Pkg.add lines after the packages where installed in my environment for the first time, but I encountered some errors on some machines if I remove them. I guess that adding the packages forces the precompilation of the environment, so that every machine is sure to correctly compile each time the sources of the required packages for its specific architecture.

However, even in this case there is still the big issue of multiple machines precompiling the project at the same time, which is very likely to happen since I can have lots of scripts running simultaneously on the cluster.

I assume a good approach would be to have each machine save precompiled caches in a directory of its own (maybe based on its hostname), and only read them from there. I suppose this should be related to the JULIA_DEPOT_PATH variable, but I don’t know exactly how.

I’m sorry if my question is too basic, but I’m quite new to Julia and I wasn’t able to figure out how to solve this. I hope you can help me understand more about how Julia works.

Tomas_Pevny · October 12, 2023, 8:02pm

Assuming all machines run the same OS (linux), the solution should be relatively easy. If your home is the same shared directory /home/your_name, then julia will look for installed packages in /home/your_name/.julia/packages by default. Then, on worker machines, you run your script as you usually do with a correct environment and packages will be loaded from shared /home/your_name/.julia/packages. No need to run Pkg.add every time, since they are already there. No need to change environmental variables.

I wish I was sufficiently clear. This is how I do it on our school cluster.

Farcitoast · October 12, 2023, 8:10pm

Thank you for the extremely quick reply, and yes all machines use Linux. I’m not sure I fully understand what you suggest. If I understood correctly, you suggest to create a different environment for every machine so that it will load packages from there… is this what you mean by “running a script with a corrected environment”? Will this solve the precompilation problem?

Tomas_Pevny · October 13, 2023, 4:29am

No,

Even the envirnoment is shared.

I assume all machines on the cluster see /home/your_name/.julia
all machines see /home/your_name/julia/cool_project where you have the project and the environment.

Then, the best way is to just run julia as julia --project=. in directory /home/your_name/julia/cool_project (if you use slurm, it has a -D option to specify directory) and that’s it. Assuming the packages were instantiated, it will run without an issue.

Topic		Replies	Views
Best practice for packages on shared drives General Usage	12	1107	June 28, 2022
Precompiled packages when $HOME is shared among different systems? Tooling	4	338	September 17, 2022
Change the default global environment path (Julia on HPC) General Usage question , package , hpc , precompilation , environment	6	117	November 11, 2024
Sync'ing Julia packages between multiple machines General Usage	6	690	July 3, 2023
.julia folder in dropbox General Usage	3	736	January 14, 2019

Single folder shared by different machines

Related topics