I have a workflow set up right now where I want to use my laptop as the master process, with
addprocs(machine_spec) pointing to a remote machine I’m SSHing to (with Julia and all dependencies installed). The data is stored locally on my machine, and my expected behavior was that the master process would transfer the data that needed processing to the remote machine.
loadtable() looks for the data on-disk on the remote machine with the child processes. Is there a way to avoid needing to have my data duplicated on my remote machine?
Depending on your specific application, the easiest way is probably to remotely mount your local data on the remote machine via sshfs. For that, ssh into your remote machine and mount the local directory via sshfs username@localip:/dir/to/datafile /mountpoint/on/remote/machine (adapt the file paths of course).
I don’t think there is a way to automatically transfer data between workers.
I use NFS for any data sharing among workers since it will generally perform better than sshfs when I’m at home; however, its performance on a remote connection (like over the internet) is apparently not great, and I use sshfs in such situations.