I have been trying to learn how to perform distributed parallel programming in Julia and would like to clarify some (beginner) doubts.
using Distributed
addprocs(4)
@everywhere function foo(n)
sleep(0.001)
return randn(10)
end
function inner_loop(func, n_times)
out = @distributed (+) for var = [10 for i=1:n_times]
func(var);
end
end
function main(n_outer_loops)
out=zeros(10);
for i=1:n_outer_loops
out += inner_loop(foo, 10)
end
return out;
end
r = main(4)
My inner loop is running in parallel with the @distributed macro and with the reduction (+). Does the function wait until each func is complete in each processor and then adds it to the solution?
For var=1, Worker1 completes func,…WorkerN completes func. Afterward, add everything to out.
Or does it add to the solution as soon as func is complete in a worker? Worker1 completes func, adds to out,… WorkerN completes func adds to out.
Since this is parallelizing only the inner loop, is this a suitable implementation? Or is Julia creating and destroying the parallel portion at each outer_loop and thus having a non-trivial overhead? If so, can it be created once, and then assign the jobs in the inner_loops?
I have read online that @distributed splits the jobs evenly across the workers, and that pmap can do some load balancing. However, pmap returns a vectors of solutions after each inner_loop which I do not have a need to. Is is possible to use it with the same concept as @distributed where instead of returning a vector, adds the solution to the out variable?