I am using RemoteChannels and Distributed for a multicore simulation with the template provided here:
https://docs.julialang.org/en/v1/manual/parallel-computing/index.html#Channels-and-RemoteChannels-1
Each process can take a few hours so I’d like to resize my Slurm job after the RemoteChannel with the data/parameters to use becomes empty and some of the workers become idle.
Is there any way to do this within julia (i.e., determining when a worker is idle)? I found this:
Is this the best way to get the worker to complete the do_work
function after the RemoteChannel with the data/parameters becomes empty? I don’t know exactly how RemoteChannels work, so I was wondering if there is some chance that do_work
here would complete with items still in the RemoteChannel.
If this does work, then I suppose I could have another channel that contains the IDs of the worker and each worker puts its ID there before do_work
finishes. Then, the main process could use Slurm to resize the job…
Thoughts?