I saw some strange performance today and I’m wondering what I’m doing wrong here.
julia> using BenchmarkTools
julia> a = [1:1000000;];
julia> @btime map(log, a);
13.449 ms (3 allocations: 7.63 MiB)
julia> addprocs(4)
4-element Array{Int64,1}:
2
3
4
5
julia> wp=CachingPool(workers())
CachingPool(Channel{Int64}(sz_max:9223372036854775807,sz_curr:4), Set([4, 2, 3, 5]), Dict{Tuple{Int64,Function},RemoteChannel}())
julia> @btime pmap(wp, log, a);
133.079 s (119748792 allocations: 3.38 GiB)
I did notice that the bottleneck seemed to be the master process (it was at 90+% CPU consistently while the workers were between 30% and 40%), but I was surprised at how much slower the pmap code was. Any ideas? If I had to guess, it’s because the computation is small, and this is the result of lots of data movement between nodes, but it’d be nice to have someone confirm this.