I have been trying to learn how to perform distributed parallel programming in Julia and would like to clarify some (beginner) doubts.
using Distributed
addprocs(4)
@everywhere function foo(n)
sleep(0.001)
return randn(10)
end
function inner_loop(func, n_times)
out = @distributed (+) for var = [10 for i=1:n_times]
func(var);
end
end
function main(n_outer_loops)
out=zeros(10);
for i=1:n_outer_loops
out += inner_loop(foo, 10)
end
return out;
end
r = main(4)
My inner loop is running in parallel with the @distributed
macro and with the reduction (+)
. Does the function wait until each func
is complete in each processor and then adds it to the solution?
For var=1, Worker1 completes func
,…WorkerN completes func
. Afterward, add everything to out
.
Or does it add to the solution as soon as func
is complete in a worker? Worker1 completes func
, adds to out
,… WorkerN completes func
adds to out
.
Since this is parallelizing only the inner loop, is this a suitable implementation? Or is Julia creating and destroying the parallel portion at each outer_loop
and thus having a non-trivial overhead? If so, can it be created once, and then assign the jobs in the inner_loops
?
I have read online that @distributed
splits the jobs evenly across the workers, and that pmap
can do some load balancing. However, pmap
returns a vectors of solutions after each inner_loop
which I do not have a need to. Is is possible to use it with the same concept as @distributed
where instead of returning a vector, adds the solution to the out
variable?