I have pretty much the same setup as described here, that is, I have a local run file that should start parallel computations on remote machines. These computations require functions defined in several files. Simply include these files would require them (and their folder structure) to be physically present on the remote machines, which I would like to avoid.
The suggested solution in the above post (using ParallelDataTransfer.jl does not work for me when I test it with local workers:
using Distributed
using ParallelDataTransfer: include_remote
file = touch("tmp.jl")
open(file, "w") do f
write(f,"fun() = 42")
end
addprocs(1,dir=pwd())
#%% does not work, "fun not defined"
include_remote(file, workers())
@fetchfrom workers()[1] fun()
#%% works
@spawnat workers()[1] include(file)
@fetchfrom workers()[1] fun()
Is ParallelDataTransfer simply not working any more? I see that there hasn’t been much happening over the last couple of years. I am running Julia 1.7.2.
May I ask what type of remote machines these are?
On an HPC setup you would normally have a filesystem which is mounted on all workers.
If you are using cloud machines you should be able to set up a shared filesystem quite easily.
These remote machines are local cluster nodes running Linux and administrated by our research group. You are right in that they all share a common filesystem.
However, I would typically initiate the overall process from my personal computer and just ship out all the calculating to the workers (Master - slave like). That means that I currently have to ensure that my local scripts are always synchronized with their counterparts on the shared cluster filesystem, so all in all two locations. That’s certainly manageable, but if this redundancy could be avoided, even better.
OK, so you do not mount the shared filesystem on your laptop - this makes sense as it travels around with you.
There are many ways of syncing filesystems - a search comes up with
Another thought - how about keeping your code in a Git repository?
At the start of your job the Julia code runs a git checkout.
Somebody much smarter than me will come along and show that the Julia Package Manager can do this job much better.
If your remote machines are using a batch processing system this will create a unique directory on each server per job, called something like $TMPDIR
you can use a git checkout to get your code into that directory
I have a similar usecase and run distributed code on machines that are not a part of any cluster at all - just a few random workstations in our group. Julia doesn’t really have built-in features to make it easy, and I’ve written a small helper package: [ANN] DistributedAdhoc.jl: for machines that don't share a common filesystem. It can transfer individual files and directories without extra setup, and share whole julia environments from the local machine.
Yes, that’s my current solution. Since I use DrWatson to manage the project, the syncing is no problem. You just need to remember doing it, and at some point I spent quite some time trying to figure out why stuff worked locally but not on the remote machines until I remembered that I forgot to sync the two repos… Arguably, this won’t happen to me again but who knows…
That’s a good point. If you just want to combine some random workstations, your approach seems very valuable. Thanks!