The following suboptimal behavior of the parallel map has been observed:
53.927 s (82132566 allocations: 2.81 GiB)
2.011 s (8000041 allocations: 167.85 MiB)
18.922 ms (2 allocations: 7.63 MiB)
20.058 ms (3 allocations: 7.63 MiB)
The above was obtained with
N = 1_000_000
a = rand(N) + rand(N)*1im
using BenchmarkTools
using Distributed
addprocs(2)
@btime aa = pmap(x -> abs(x), $a)
rmprocs(workers())
@btime aa = pmap(x -> abs(x), $a)
@btime aa = abs.($a)
@btime aa = map(x -> abs(x), $a)
pmap
allocates a lot of memory, and it is slow.
I don’t have an explanation for that. Do you?
Julia Version 1.3.0-DEV.466
Commit 8d4f6d24c0 (2019-06-28 18:40 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8705G CPU @ 3.10GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = "C:\Users\PetrKrysl\AppData\Local\atom\app-1.38.2\atom.exe" -a
JULIA_NUM_THREADS = 4
pmap
is using interprocess communication 1 million times to execute the function. That’s a lot of overhead relative to a vectorized function call. Why would you expect it to be as fast? It’s intended to be used for executing functions with non-trivial run-time.
I suppose pmap
could be designed to fall back to regular map
when no workers are registered, but what’s the point of calling pmap
without any workers?
2 Likes
That is weird. The documentation says " By default, pmap
distributes the computation over all specified workers." Frankly I expected the parallel map to divide the collection into buckets to pass to the workers, one bucket per worker. Not to call a worker for each individual element of the collection. Perhaps it might be worthwhile to state this explicitly in the documentation?
Yeah pmap is lower level than that. Maybe check out something like
https://github.com/JuliaParallel/Blocks.jl
1 Like
You can specify a batch size in pmap
. However, from the docs:
" Julia’s pmap
is designed for the case where each function call does a large amount of work. In contrast, @distributed for
can handle situations where each iteration is tiny, perhaps merely summing two numbers."
That being said, pmap
is good for scheduling computations that don’t take the same amount of time:
julia> @everywhere f(i)=(sleep(i);println("$i,done"))
julia> @time pmap(f,4:-1:1)
From worker 5: 3,done
From worker 4: 4,done
From worker 4: 1,done
From worker 5: 2,done
5.013898 seconds (345 allocations: 24.656 KiB)
4-element Array{Nothing,1}:
nothing
nothing
nothing
nothing
julia> @time @sync @distributed for i in 4:-1:1
f(i)
end
From worker 5: 2,done
From worker 5: 1,done
From worker 4: 4,done
From worker 4: 3,done
7.065881 seconds (184.38 k allocations: 9.019 MiB)
Task (done) @0x00000001074a3610
Cheers!
1 Like