I am using the ClusterManagers.ElasticManager
to handle up to hundreds of julia processes in my cluster. Say I do:
using Distributed, ClusterManagers
c = ClusterManagers.ElasticManager(myip, port, "foobar")
and then start up several processes through our queueing system, which would start up julia processes with
submit2queue echo "using ClusterManagers; ClusterManagers.elastic_worker(\"foobar\")" | julia
Julia processes will start up over some period of time as resources become available. Once I have a considerable amount of workers I may start some computation:
@everywhere f(x) = (sleep(1); 2 * x)
results = pmap(f, 1:100)
This all works nice until a “late” worker is registered in the ElasticManager
while the pmap
is doing its job. Julia is smart and tries utilize also these workers. However they do not know what f(x)
is, because they missed the @everywhere ...
and therefore throw an error.
How can I initialize such late workers?