Weird behavior of pmap

PetrKryslUCSD · July 2, 2019, 7:28am

The following suboptimal behavior of the parallel map has been observed:

  53.927 s (82132566 allocations: 2.81 GiB)
  2.011 s (8000041 allocations: 167.85 MiB)
  18.922 ms (2 allocations: 7.63 MiB)
  20.058 ms (3 allocations: 7.63 MiB)

The above was obtained with

N = 1_000_000
a = rand(N) + rand(N)*1im

using BenchmarkTools
using Distributed

addprocs(2)
@btime aa = pmap(x -> abs(x), $a)
rmprocs(workers())
@btime aa = pmap(x -> abs(x), $a)
@btime aa = abs.($a)
@btime aa = map(x -> abs(x), $a)

pmap allocates a lot of memory, and it is slow.

I don’t have an explanation for that. Do you?

Julia Version 1.3.0-DEV.466
Commit 8d4f6d24c0 (2019-06-28 18:40 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8705G CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = "C:\Users\PetrKrysl\AppData\Local\atom\app-1.38.2\atom.exe"  -a
  JULIA_NUM_THREADS = 4

PetrKryslUCSD · July 2, 2019, 5:45pm

Bump.

robsmith11 · July 2, 2019, 6:45pm

pmap is using interprocess communication 1 million times to execute the function. That’s a lot of overhead relative to a vectorized function call. Why would you expect it to be as fast? It’s intended to be used for executing functions with non-trivial run-time.

I suppose pmap could be designed to fall back to regular map when no workers are registered, but what’s the point of calling pmap without any workers?

PetrKryslUCSD · July 2, 2019, 6:50pm

That is weird. The documentation says " By default, pmap distributes the computation over all specified workers." Frankly I expected the parallel map to divide the collection into buckets to pass to the workers, one bucket per worker. Not to call a worker for each individual element of the collection. Perhaps it might be worthwhile to state this explicitly in the documentation?

robsmith11 · July 2, 2019, 8:15pm

Yeah pmap is lower level than that. Maybe check out something like
https://github.com/JuliaParallel/Blocks.jl

raminammour · July 2, 2019, 8:43pm

You can specify a batch size in pmap. However, from the docs:

" Julia’s pmap is designed for the case where each function call does a large amount of work. In contrast, @distributed for can handle situations where each iteration is tiny, perhaps merely summing two numbers."

That being said, pmap is good for scheduling computations that don’t take the same amount of time:

julia> @everywhere f(i)=(sleep(i);println("$i,done"))

julia> @time pmap(f,4:-1:1)
      From worker 5:	3,done
      From worker 4:	4,done
      From worker 4:	1,done
      From worker 5:	2,done
  5.013898 seconds (345 allocations: 24.656 KiB)
4-element Array{Nothing,1}:
 nothing
 nothing
 nothing
 nothing

julia> @time @sync @distributed for i in 4:-1:1
       f(i)
       end
      From worker 5:	2,done
      From worker 5:	1,done
      From worker 4:	4,done
      From worker 4:	3,done
  7.065881 seconds (184.38 k allocations: 9.019 MiB)
Task (done) @0x00000001074a3610

Cheers!

Topic		Replies	Views
Struggling with pmap New to Julia parallel	8	987	September 5, 2019
Behavior of worker pool in pmap Performance pmap	2	891	November 25, 2018
"Textbook" use of pmap but strange execution times Julia at Scale question	0	429	July 26, 2019
Why is the parallel map so slow? General Usage parallel , optimization , pmap	2	3196	May 10, 2020
unexpected pmap behaviour New to Julia performance , parallel , regex	0	475	March 4, 2019

Weird behavior of pmap

Related topics