JuliaDB w/ Workers on Remote Machine

versipellis · June 18, 2019, 4:01pm

I have a workflow set up right now where I want to use my laptop as the master process, with addprocs(machine_spec) pointing to a remote machine I’m SSHing to (with Julia and all dependencies installed). The data is stored locally on my machine, and my expected behavior was that the master process would transfer the data that needed processing to the remote machine.

However, loadtable() looks for the data on-disk on the remote machine with the child processes. Is there a way to avoid needing to have my data duplicated on my remote machine?

daniel · June 18, 2019, 4:43pm

Depending on your specific application, the easiest way is probably to remotely mount your local data on the remote machine via sshfs. For that, ssh into your remote machine and mount the local directory via sshfs username@localip:/dir/to/datafile /mountpoint/on/remote/machine (adapt the file paths of course).

I don’t think there is a way to automatically transfer data between workers.

jpsamaroo · June 18, 2019, 6:25pm

I use NFS for any data sharing among workers since it will generally perform better than sshfs when I’m at home; however, its performance on a remote connection (like over the internet) is apparently not great, and I use sshfs in such situations.

Topic		Replies	Views
Include files on (remote) worker processes General Usage parallel , distributed	5	620	June 16, 2022
Using remote workers on Linux from Mac OS master General Usage package , hpc , parallel , cluster	0	691	June 21, 2017
Basic remote procedure call via ssh from Linux to Windows Julia at Scale	8	2096	December 29, 2020
The ultimate guide to distributed computing Julia at Scale parallel , cluster , distributed	44	9866	June 21, 2021
Two level distributed / parallel execution Julia at Scale question	4	1076	April 22, 2020

JuliaDB w/ Workers on Remote Machine

Related topics