Unstable execution time with high standard error in Julia

Thanks for your reply! I pay attention to the standard deviation because it is one of the criteria I use to measure the stability of an algorithm. In fact, I also record the mean, median, and standard deviation. However, when I see such a large standard deviation, I feel very puzzled about why some cases take so much time. I need to figure out whether it’s an issue with the algorithm or the code.

Standard deviation is probably not a good measure for distributions that are extremely far from gaussian, especially multi-modal like here, though they can give a qualitative impression of β€œlarge variability”.

So you’re using wall-clock runtime as a proxy for something like β€œhow many iteration steps did my adaptive algorithm need to go below the error threshold”?

That’s not bad for quick-and-dirty impressions and for end-to-end plausibility checks, but I would recommend logging that number directly, for many reasons (e.g. it also allows to separate improvements of the algorithm and improvements of the implementation).

In fact, the variance of the number of the iteration is small, while the running time of the whole algorithm is relatively large. This makes me confused, so I want to solve this issue.

Depending on the RAM you have, you can also just turn off the garbage collector during the benchmark to see if that solves the issue:

using BenchmarkTools
GC.enable(false)
@benchmark your_function()
 
GC.enable(true)

Thanks for your advice! The variance becomes smaller.
Before GC.enable(false), the result is

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  147.334 ΞΌs …   3.006 ms  β”Š GC (min … max): 0.00% … 93.81%
 Time  (median):     164.062 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   174.205 ΞΌs Β± 131.898 ΞΌs  β”Š GC (mean Β± Οƒ):  5.59% Β±  6.78%

                β–β–‚β–ƒβ–ƒβ–β–β–β–‚β–‚β–†β–…β–‡β–ˆβ–ˆβ–‡β–†β–…β–ƒβ–ƒβ–                             
  β–β–β–β–β–β–β–‚β–‚β–‚β–‚β–ƒβ–„β–…β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–†β–†β–…β–…β–„β–ƒβ–„β–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–β–‚β–‚β–β–β–β–β– β–„
  147 ΞΌs           Histogram: frequency by time          186 ΞΌs <

 Memory estimate: 385.34 KiB, allocs estimate: 268.

After GC.enable(false), the result is

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  153.250 ΞΌs …  1.817 ms  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     182.520 ΞΌs              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   187.567 ΞΌs Β± 56.816 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

         β–β–ƒβ–…β–‡β–‡β–ˆβ–†β–†β–…β–„β–…β–„β–†β–…β–„β–‚β–ƒβ–‚                                     
  β–‚β–‚β–ƒβ–…β–†β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–…β–…β–…β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–‚β–‚β–‚β–‚ β–„
  153 ΞΌs          Histogram: frequency by time          261 ΞΌs <

 Memory estimate: 385.34 KiB, allocs estimate: 268.

Other factors that can contribute to the variance:

  • other processes that are running; close all browser windows and VSCode when benchmarking
  • make sure you are not using a laptop that is running on battery, otherwise the powermangement causes variations of the performance
  • if you have a CPU with slow cores and performance cores, try to make sure that only the performance cores are used. This works only on Linux: GitHub - carstenbauer/ThreadPinning.jl: Readily pin Julia threads to CPU-threads

I see. Thanks for your suggestion!