Benchmarking Parallel Computing Tools

zekeriya.sari · February 25, 2021, 10:48am

I am trying to compare the performances of a couple of functions that solve a couple of differential equations. According to the benchmark results, the performance of the function that uses threads and distributed for are better than that of the function that solves the problems sequentially. But, the point I got surprised by is that the performance of the function that uses pmap is the worst. I think the problem is related to something such as data movement but I am not sure. Am I doing something wrong in the benchmarking?

Here is the case.

using BenchmarkTools
using DifferentialEquations 
using Distributed 

addprocs(8)
@everywhere using DifferentialEquations
@everywhere function f(dx, x, u, t)
    dx[1] = 10 * (x[2] - x[1]) 
    dx[2] = x[1] * (28 - x[3]) - x[2] 
    dx[3] = x[1] * x[2] - 8 / 3 * x[3]
end

runsequential(probs)       = for prob in probs solve(prob) end 
runthreaded(probs)         = Threads.@threads for prob in probs solve(prob) end 
rundistributedfor(probs)   = @sync @distributed for prob in probs solve(prob) end 
runpmap(probs)             = pmap(solve, probs) 

probs = [ODEProblem(f, rand(3), (0., 100.)) for n in 1 : 100]

benchsequential = @benchmark runsequential($probs)  
benchthreaded = @benchmark runthreaded($probs)  
benchdistributedfor = @benchmark rundistributedfor($probs)  
benchpmap = @benchmark runpmap($probs)  

foreach(display, [benchsequential, benchthreaded, benchdistributedfor, benchpmap])

BenchmarkTools.Trial: 
  memory estimate:  143.06 MiB
  allocs estimate:  1300850
  --------------
  minimum time:     100.788 ms (7.03% GC)
  median time:      114.137 ms (7.70% GC)
  mean time:        115.754 ms (8.67% GC)
  maximum time:     150.830 ms (6.87% GC)
  --------------
  samples:          44
  evals/sample:     1
BenchmarkTools.Trial: 
  memory estimate:  143.06 MiB
  allocs estimate:  1300891
  --------------
  minimum time:     33.571 ms (0.00% GC)
  median time:      43.118 ms (0.00% GC)
  mean time:        79.332 ms (49.96% GC)
  maximum time:     191.161 ms (80.89% GC)
  --------------
  samples:          64
  evals/sample:     1
BenchmarkTools.Trial: 
  memory estimate:  411.44 KiB
  allocs estimate:  13218
  --------------
  minimum time:     37.872 ms (0.00% GC)
  median time:      50.666 ms (0.00% GC)
  mean time:        54.386 ms (0.00% GC)
  maximum time:     95.379 ms (0.00% GC)
  --------------
  samples:          92
  evals/sample:     1
BenchmarkTools.Trial: 
  memory estimate:  226.99 MiB
  allocs estimate:  2810204
  --------------
  minimum time:     944.815 ms (0.00% GC)
  median time:      993.343 ms (8.28% GC)
  mean time:        1.025 s (10.96% GC)
  maximum time:     1.201 s (27.09% GC)
  --------------
  samples:          5
  evals/sample:     1

julia> versioninfo() 
Julia Version 1.6.0-rc1
Commit a58bdd9010 (2021-02-06 15:49 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = "/snap/code/55/usr/share/code/code"

ChrisRackauckas · February 25, 2021, 12:28pm

Yes. I describe it here in more detail

https://mitmath.github.io/18337/lecture6/styles_of_parallelism.html

pmap is a dynamic scheduler distributed implementation, so for small cost and non-random time f it’s not going to be as good as @distributed, but that changes if the times are very stochastic and as f increases.

zekeriya.sari · February 25, 2021, 12:48pm

@ChrisRackauckas Thank you for the great tutorial.

Topic		Replies	Views
Parellelization for large ODEs, Multithreading fails Performance performance , multithreading , differentialequation	16	305	March 30, 2023
Parallel and distributed very slow New to Julia parallel , benchmark , distributed , threads	6	997	October 14, 2021
Poor parallel speedup Performance multithreading	2	219	January 17, 2025
Is this the right way of distributing the system of differential equations over multiple nodes on a cluster? Performance distributed , differentialequation	2	177	May 6, 2024
Pmap extremely slow when function returns large object Performance question , performance , parallel	4	810	January 20, 2022

Benchmarking Parallel Computing Tools

Related topics