Distributed nested in Pmap

Alessio_Bellisomi · February 22, 2019, 2:25pm

Hello,
I am fairly new to Julia, I am trying to understand a little better how the workload is distributed, so I prepared this little script to help me out.

An external pmap splits the work between different processes, each one internally will distribute some calculation over the array and aggregate.

using Distributed
addprocs(4)

list = [rand(1, 3), rand(1, 3), rand(1, 3), rand(1, 3), rand(1, 3)]

@everywhere function aggregate_data(i)
    i*i
end

@everywhere function loop_and_sum(data::Array)
    @distributed (+) for val in data 
        aggregate_data(val) 
    end
end

result = pmap(x->loop_and_sum(x), list)

rmprocs(4)

In the documentation for the @distributed macro I read that it spawns independent tasks over all available workers. This is what I need to clarify: who are these workers? Aren’t those the processes that I created at the beginning of the scripts, which are already busy pmapping stuff? Is this efficient?

Also, as you can see above I call the rmprocs to cleanup the workers when I am done. What I see in the Task Manager though, is that those process are never killed until I don’t kill the main process (~restart the Julia REPL). How is that meant to work?

Thanks a lot

dmolina · February 22, 2019, 5:11pm

I am not an expert, but:

Worker are different processes that run your code. You can indicate how many do you want to use or with “julia -p ” or with addprocs(). You should have cpus in that computer to get the right performance. Because all process need store its own data, sometimes it could be better to use thread, because they share memory (in conjuntion with SharedArrays), but sometimes you can get strange results if you do not are careful.

The use of everywehere is right, however, with “pmap” you are already using the processing in the different workers, so you should not use distributed, I think.

About rmprocs, you are not removing all processes, you are only removing the 4th process, you can remove all with:
[rmprocs(n) for n in 1:4]

or in a simpler way, storing ids obtained by addprocs:

np = addprocs(4)
…
rmprocs(np)

Alessio_Bellisomi · February 22, 2019, 5:54pm

About rmprocs, you are not removing all processes, you are only removing the 4th process

Ah, that makes total sense
Thanks for your sharing your thoughts. I am working on a script which is nesting pmaps and distributed and other quirks, you see why I am getting confused!

Topic		Replies	Views
Issue with nested pmap Julia at Scale bug , pmap	2	693	June 11, 2019
Pmap usage Performance question , parallel	1	361	December 13, 2020
Distributed: Passing views of an array for read access to workers (using pmap) General Usage question , performance , parallel , distributed , views	8	500	January 31, 2024
Pmap but with control over which workers do the tasks? Julia at Scale question	0	411	January 12, 2020
Pmap() data inputs New to Julia	4	314	March 20, 2023

Distributed nested in Pmap

Related topics