Good day everyone!
As the title suggests, I have a question about how to run a parallel computation on multiple nodes in a cluster.
My julia code calls an external library, written in C, and then parallelizes across multiple processes using a julia module as interface (that is, which calls the C functions). I specify that, through the following commands:
(…)
export JULIA_LOAD_PATH = (directory in which my julia module is defined)
export LD_LIBRARY_PATH = (directory in which my C library is defined)
./julia -p (number of processes) ./myprogram.jl etc …
(…)
my code works perfectly, both on the cluster and on my laptop. But now I would like to start using more cores than those on a single node, that is, I would like to start running the code on multiple nodes through the cluster I am using.
After specifying, in the .sh file, that I want to launch my code through multiple nodes in the clusters (specifying the number of cpu etc.), I immediately noticed that, through the commands:
(…)
export JULIA_LOAD_PATH = (directory in which my julia module is defined)
export LD_LIBRARY_PATH = (directory in which my library is defined)
srun hostname -s > hostfile
./julia --machine-file ./hostfile ./myprogram.jl etc.
(…)
The same code doesn’t work, because the workers can’t find the files that, inside the “myprogram.jl” program, are used as:
@everywhere using Mymodule
In fact the output I get is:
ERROR: LoadError: On worker 2:
ArgumentError: Package Mymodule [top-level] is required but does not seem to be installed:
Run Pkg.instantiate () to install all recorded dependencies.
(…)
This also occurs if I specify that I want to launch the code on a single node with the above syntax i.e. using --machine-file etc., so the problem is definitely in the way I am telling julia to launch the processes and how (and from where) to upload packages. Of course I have to do something different, but even reading the documentation (as well as other similar discussions) I don’t understand exactly what.
Can anyone tell me exactly where I’m wrong and what should I do?
Thanks everyone in advance for your help!