Struggling with pmap


I am struggling to understand why pmap is slow. I use the following fast function rand().

using Distributed
@everywhere using  Random
@everywhere g(x) = rand()
 @time map(g,1:100_000_000);
  0.697132 seconds (10 allocations: 762.940 MiB, 3.11% gc time)
@time pmap(g,1:100_000_000);
# still running...

I my use case, I have a function g which performs stochastic simulations and is quite fast to execute. I want to run it million of times though.

I am sorry if this is a trivial question, but can someone give me a hint about this behaviour and possibly how to improve it.

Thank you

Best regards

1 Like

Can you communicate with those distributed processes? Are they running on your local machine? I guess so…
Is it possible that you have used a lot of memory and have started swapping on your local machine?

You are right. Single machine multiprocessors.

It may have to do with the overhead of pmap. I’ve heard in general that pmap should be used when the function g does a large amount of work to offset the cost of overhead. In your case, apparently, the function g is much faster than the time it takes to send out the work to different processors. If you want to parallelize a fast function (such as your g), either use low level primitives such as spawnat and fetch or macros like @parallel on your for loop.

What do these retuen after you addprocs ?

@affans makes an excellent point. If you are going to parallelize by distrributing functions you need to give each worker a ‘decent’ amount of work to do. May apology to non-English native speakers.
As @affans says the ration of the time taken to do the task should be greater than the time to comminucate, send out and return the data.

Using your suggestions, I got it to work better (not for the MWE posted here) by making g computing more. Something along those lines:

@everywhere g(x) = rand(1000_000)
@time pmap(g,1:100);

I still think that pmap is not the way to go here (in the context of your function). See this excerpt from the documentation:

Julia’s pmap is designed for the case where each function call does a large amount of work. In contrast, @distributed for can handle situations where each iteration is tiny, perhaps merely summing two numbers. Only worker processes are used by both pmap and @distributed for for the parallel computation. In case of @distributed for , the final reduction is done on the calling process.

Can you try @distributed for and see if it speeds up your result?

This thread may help too, it echos what was said above: