Updating some code from 0.5 to 1.0 massively slowed pmap
calls for our use case.
Briefly, distributing the computation of f(x,arg)
over the set X seems to copy and send arg
during each iteration. This becomes a problem when the parameters in arg
include large objects.
This can be reproduced in 0.6+ (tested 0.6.4 and 1.0.0). Benchmarks below are for a fresh 1.0 install on a windows machine (also reproduced on a linux HPC)
using BenchmarkTools
VERSION.major < 1 || using Distributed
addprocs() ##4
@everywhere begin
bigarr = ones(10^8)
f_passall(a,x) = length(x) + a
end
its = 1:20
julia> @btime map(x->f_passall(x,bigarr), its);
940.280 ns (27 allocations: 736 bytes)
julia> @btime pmap(x->f_passall(x,bigarr), its);
2.283 s (1560 allocations: 97.86 KiB)
Redefining f
to use bigarr
as a global variable seems to fix the issue, at a cost
@everywhere f_globals(a) = length(bigarr) + a
julia> @btime map(x->f_globals(x), its);
1.391 ÎĽs (47 allocations: 1.03 KiB)
julia> @btime pmap(x->f_globals(x), its);
881.018 ÎĽs (1493 allocations: 96.64 KiB)
Increasing the number of iterations further slows down the pmap
call, proportionally
its = 1:50;
julia> @btime pmap(x->f_passall(x,bigarr), its);
5.676 s (3834 allocations: 185.53 KiB)
julia> @btime pmap(x->f_globals(x), its);
2.169 ms (3658 allocations: 182.25 KiB)
The issue did not seem to occur as of 0.5.0: f_passall
and f_globals
have comparable performance, and most of the time is spent on overhead (remaining about constant with greater its
).
julia> @time pmap(x->f_passall(x,bigarr), 1:20);
0.290894 seconds (422.72 k allocations: 17.810 MB, 2.42% gc time)
julia> @time pmap(x->f_passall(x,bigarr), 1:50);
0.290469 seconds (427.01 k allocations: 17.937 MB, 2.49% gc time)
julia> @time pmap(x->f_globals(x), 1:20);
0.276240 seconds (422.46 k allocations: 17.765 MB)
julia> @time pmap(x->f_globals(x), 1:50);
0.288293 seconds (426.70 k allocations: 17.921 MB, 2.39% gc time)
Happy to create an issue if this is unintended behavior by pmap
.