Hello,
I am fairly new to Julia, I am trying to understand a little better how the workload is distributed, so I prepared this little script to help me out.
An external pmap splits the work between different processes, each one internally will distribute some calculation over the array and aggregate.
using Distributed
addprocs(4)
list = [rand(1, 3), rand(1, 3), rand(1, 3), rand(1, 3), rand(1, 3)]
@everywhere function aggregate_data(i)
i*i
end
@everywhere function loop_and_sum(data::Array)
@distributed (+) for val in data
aggregate_data(val)
end
end
result = pmap(x->loop_and_sum(x), list)
rmprocs(4)
In the documentation for the @distributed macro I read that it spawns independent tasks over all available workers. This is what I need to clarify: who are these workers? Aren’t those the processes that I created at the beginning of the scripts, which are already busy pmapping stuff? Is this efficient?
Also, as you can see above I call the rmprocs to cleanup the workers when I am done. What I see in the Task Manager though, is that those process are never killed until I don’t kill the main process (~restart the Julia REPL). How is that meant to work?
Thanks a lot