Most likely, you’re what you’re seeing here is only the memory consumption of some launching process (which has to wait for spawned tasks to complete, and can thus measure the correct elapsed time).
In order to get a total memory consumption, I don’t know of any other way than to benchmark individual spawned tasks.
Here is a complete example:
Sequential version
First, a plain, sequential version to be used as reference:
julia> using BenchmarkTools
julia> f_seq(n) = map(1:n) do i
sleep(1)
sum([0.1*j for j in 1:10_000*i])
end
f_seq (generic function with 1 method)
julia> @btime f_seq(10)
10.028 s (72 allocations: 4.20 MiB)
10-element Array{Float64,1}:
5.0005e6
2.0001e7
4.50015e7
8.0002e7
1.250025e8
1.80003e8
2.450035e8
3.20004e8
4.050045e8
5.00005e8
pmap-based version
Like you showed, a simple parallel version allows measuring the speedup, but not the memory usage:
julia> f_par(n) = pmap(1:n) do i
sleep(1)
sum([0.1*j for j in 1:10_000*i])
end
f_par (generic function with 1 method)
julia> @btime f_par(10)
3.014 s (743 allocations: 30.97 KiB)
10-element Array{Float64,1}:
5.0005e6
2.0001e7
4.50015e7
8.0002e7
1.250025e8
1.80003e8
2.450035e8
3.20004e8
4.050045e8
5.00005e8
Timed parallel version
If you time each task, then collect the statistics, you can retrieve total memory usage. In the following, the pmap function body is wrapped in @timed
to get resources usage; the global result of pmap
is then post-processed to retrieve calculation results and accumulate statistics:
julia> f_par_timed(n) = pmap(1:n) do i
@timed begin
sleep(1)
sum([0.1*j for j in 1:10_000*i])
end
end |> post
f_par_timed (generic function with 1 method)
julia> post(res) = begin
total_time = sum(x[2] for x in res)
@show total_time
total_bytes = sum(x[3] for x in res)
@show total_bytes
[x[1] for x in res]
end
post (generic function with 1 method)
julia> f_par_timed(10)
total_time = 10.028683662
total_bytes = 4402720
10-element Array{Float64,1}:
5.0005e6
2.0001e7
4.50015e7
8.0002e7
1.250025e8
1.80003e8
2.450035e8
3.20004e8
4.050045e8
5.00005e8
We get back here a cumulated elapsed time of ~10s and cumulated memory usage of ~4.4MB (as in the sequential version above). The ~31kB from the benchmarking of pmap
above should probably be added to this.
I do not know whether there are other significant sources of memory usage which are neither captured by a benchmarking pmap
itself nor the function bodies. And I should also add that these measurements are based on @timed
, which means that they are probably less reliable than what BenchmarkTools
would provide.