Benchmarking Parallel Computing Tools

I am trying to compare the performances of a couple of functions that solve a couple of differential equations. According to the benchmark results, the performance of the function that uses threads and distributed for are better than that of the function that solves the problems sequentially. But, the point I got surprised by is that the performance of the function that uses pmap is the worst. I think the problem is related to something such as data movement but I am not sure. Am I doing something wrong in the benchmarking?

Here is the case.

using BenchmarkTools
using DifferentialEquations 
using Distributed 

addprocs(8)
@everywhere using DifferentialEquations
@everywhere function f(dx, x, u, t)
    dx[1] = 10 * (x[2] - x[1]) 
    dx[2] = x[1] * (28 - x[3]) - x[2] 
    dx[3] = x[1] * x[2] - 8 / 3 * x[3]
end

runsequential(probs)       = for prob in probs solve(prob) end 
runthreaded(probs)         = Threads.@threads for prob in probs solve(prob) end 
rundistributedfor(probs)   = @sync @distributed for prob in probs solve(prob) end 
runpmap(probs)             = pmap(solve, probs) 

probs = [ODEProblem(f, rand(3), (0., 100.)) for n in 1 : 100]

benchsequential = @benchmark runsequential($probs)  
benchthreaded = @benchmark runthreaded($probs)  
benchdistributedfor = @benchmark rundistributedfor($probs)  
benchpmap = @benchmark runpmap($probs)  

foreach(display, [benchsequential, benchthreaded, benchdistributedfor, benchpmap])
BenchmarkTools.Trial: 
  memory estimate:  143.06 MiB
  allocs estimate:  1300850
  --------------
  minimum time:     100.788 ms (7.03% GC)
  median time:      114.137 ms (7.70% GC)
  mean time:        115.754 ms (8.67% GC)
  maximum time:     150.830 ms (6.87% GC)
  --------------
  samples:          44
  evals/sample:     1
BenchmarkTools.Trial: 
  memory estimate:  143.06 MiB
  allocs estimate:  1300891
  --------------
  minimum time:     33.571 ms (0.00% GC)
  median time:      43.118 ms (0.00% GC)
  mean time:        79.332 ms (49.96% GC)
  maximum time:     191.161 ms (80.89% GC)
  --------------
  samples:          64
  evals/sample:     1
BenchmarkTools.Trial: 
  memory estimate:  411.44 KiB
  allocs estimate:  13218
  --------------
  minimum time:     37.872 ms (0.00% GC)
  median time:      50.666 ms (0.00% GC)
  mean time:        54.386 ms (0.00% GC)
  maximum time:     95.379 ms (0.00% GC)
  --------------
  samples:          92
  evals/sample:     1
BenchmarkTools.Trial: 
  memory estimate:  226.99 MiB
  allocs estimate:  2810204
  --------------
  minimum time:     944.815 ms (0.00% GC)
  median time:      993.343 ms (8.28% GC)
  mean time:        1.025 s (10.96% GC)
  maximum time:     1.201 s (27.09% GC)
  --------------
  samples:          5
  evals/sample:     1
julia> versioninfo() 
Julia Version 1.6.0-rc1
Commit a58bdd9010 (2021-02-06 15:49 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = "/snap/code/55/usr/share/code/code"

Yes. I describe it here in more detail

https://mitmath.github.io/18337/lecture6/styles_of_parallelism.html

pmap is a dynamic scheduler distributed implementation, so for small cost and non-random time f it’s not going to be as good as @distributed, but that changes if the times are very stochastic and as f increases.

2 Likes

@ChrisRackauckas Thank you for the great tutorial.