Weird behavior of pmap

The following suboptimal behavior of the parallel map has been observed:

  53.927 s (82132566 allocations: 2.81 GiB)
  2.011 s (8000041 allocations: 167.85 MiB)
  18.922 ms (2 allocations: 7.63 MiB)
  20.058 ms (3 allocations: 7.63 MiB)

The above was obtained with

N = 1_000_000
a = rand(N) + rand(N)*1im

using BenchmarkTools
using Distributed

addprocs(2)
@btime aa = pmap(x -> abs(x), $a)
rmprocs(workers())
@btime aa = pmap(x -> abs(x), $a)
@btime aa = abs.($a)
@btime aa = map(x -> abs(x), $a)

pmap allocates a lot of memory, and it is slow.

I don’t have an explanation for that. Do you?

Julia Version 1.3.0-DEV.466
Commit 8d4f6d24c0 (2019-06-28 18:40 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8705G CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = "C:\Users\PetrKrysl\AppData\Local\atom\app-1.38.2\atom.exe"  -a
  JULIA_NUM_THREADS = 4

Bump.

pmap is using interprocess communication 1 million times to execute the function. That’s a lot of overhead relative to a vectorized function call. Why would you expect it to be as fast? It’s intended to be used for executing functions with non-trivial run-time.

I suppose pmap could be designed to fall back to regular map when no workers are registered, but what’s the point of calling pmap without any workers?

2 Likes

That is weird. The documentation says " By default, pmap distributes the computation over all specified workers." Frankly I expected the parallel map to divide the collection into buckets to pass to the workers, one bucket per worker. Not to call a worker for each individual element of the collection. Perhaps it might be worthwhile to state this explicitly in the documentation?

Yeah pmap is lower level than that. Maybe check out something like
https://github.com/JuliaParallel/Blocks.jl

1 Like

You can specify a batch size in pmap. However, from the docs:

" Julia’s pmap is designed for the case where each function call does a large amount of work. In contrast, @distributed for can handle situations where each iteration is tiny, perhaps merely summing two numbers."

That being said, pmap is good for scheduling computations that don’t take the same amount of time:

julia> @everywhere f(i)=(sleep(i);println("$i,done"))

julia> @time pmap(f,4:-1:1)
      From worker 5:	3,done
      From worker 4:	4,done
      From worker 4:	1,done
      From worker 5:	2,done
  5.013898 seconds (345 allocations: 24.656 KiB)
4-element Array{Nothing,1}:
 nothing
 nothing
 nothing
 nothing

julia> @time @sync @distributed for i in 4:-1:1
       f(i)
       end
      From worker 5:	2,done
      From worker 5:	1,done
      From worker 4:	4,done
      From worker 4:	3,done
  7.065881 seconds (184.38 k allocations: 9.019 MiB)
Task (done) @0x00000001074a3610

Cheers!

1 Like