I wanted to understand the performance of Dagger.jl on my laptop and ran its benchmarks there, as described in the documentation. I ran with 1 process and 8 threads (on 8 cores). This produced this output:
benchmark results for suite nmf, exec dagger, accels String[]: 10-element BenchmarkTools.BenchmarkGroup:
tags: []
"NNMF scaled by: 16" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Workers: 1" => TrialEstimate(2.737 s)
"NNMF scaled by: 41" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Workers: 1" => TrialEstimate(1.236 s)
"NNMF scaled by: 36" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Workers: 1" => TrialEstimate(634.934 ms)
"NNMF scaled by: 1" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Workers: 1" => TrialEstimate(315.710 s)
"NNMF scaled by: 26" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Workers: 1" => TrialEstimate(1.366 s)
"NNMF scaled by: 31" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Workers: 1" => TrialEstimate(1.308 s)
"NNMF scaled by: 21" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Workers: 1" => TrialEstimate(1.250 s)
"NNMF scaled by: 46" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Workers: 1" => TrialEstimate(664.206 ms)
"NNMF scaled by: 11" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Workers: 1" => TrialEstimate(2.396 s)
"NNMF scaled by: 6" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Workers: 1" => TrialEstimate(8.410 s)
benchmark results for suite nmf, exec raw, accels String[]: 10-element BenchmarkTools.BenchmarkGroup:
tags: []
"NNMF scaled by: 16" => TrialEstimate(228.647 ms)
"NNMF scaled by: 41" => TrialEstimate(621.973 ms)
"NNMF scaled by: 36" => TrialEstimate(502.466 ms)
"NNMF scaled by: 1" => TrialEstimate(15.768 ms)
"NNMF scaled by: 26" => TrialEstimate(410.510 ms)
"NNMF scaled by: 31" => TrialEstimate(476.929 ms)
"NNMF scaled by: 21" => TrialEstimate(346.264 ms)
"NNMF scaled by: 46" => TrialEstimate(695.892 ms)
"NNMF scaled by: 11" => TrialEstimate(183.921 ms)
"NNMF scaled by: 6" => TrialEstimate(92.942 ms)
Done!
What does this mean? Is the “scaled by” the problem size? Or should I scale the resulting times by that factor before comparing? What is a “TrialEstimate” – presumably the time it took to run the benchmark in that configuration? It’s suspicious that the “dagger” times are ten times higher than the “raw” times…