Order of minimum and average when configuring `samples` and `evals` in BenchmarkTools

Apparently I am not the first one to be confused about the difference between the samples and evals parameters in BenchmarkTools.jl: BenchmarkTools

(I know that setup is called newly for each of the samples but not for the evals, but this is not what my question is about.)

I am interested primarily in the @belapsed macro, which according to the documentation is a minimum over samples and/or evals:

@belapsed returns the minimum time in seconds.

The documentation also says:

  • samples : The number of samples to take. Execution will end if this many samples have been collected. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.samples = 10000 .
  • evals : The number of evaluations per sample. For best results, this should be kept consistent between trials. A good guess for this value can be automatically set on a benchmark via tune! , but using tune! can be less consistent than setting evals manually. Defaults to BenchmarkTools.DEFAULT_PARAMETERS.evals = 1 .

In the thread linked above, @baggepinnen quoted the following explanation (not sure of the source):

The reasoning behind our definition of “sample” may not be obvious to all readers. If the time to execute a benchmark is smaller than the resolution of your timing method, then a single evaluation of the benchmark will generally not produce a valid sample. In that case, one must approximate a valid sample by recording the total time t it takes to record n evaluations, and estimating the sample’s time per evaluation as t/n .

So, if I am reading this correctly, then calling @belapsed f() evals=20 samples=100 will do something like

times = zeros(20)
for i in 1:20
    times[i] = sum(@elapsed f() for j in 1:100) / 100
return minimum(times)

OTOH, the following REPL session seems to suggest the opposite—that we take the average over evals, then the minimum over samples:

julia> using BenchmarkTools

julia> @belapsed rand(10) evals=200000 samples=1

julia> @belapsed rand(10) evals=1 samples=200000

Indeed, this seems to be the only interpretation that is consistent with the definition of @belapsed as a minimum time, since the default value of evals is one.


  • Is it correct to say that @belapsed takes the average over evals, then the minimum over samples?

  • If I am trying to benchmark a function that doesn’t require any setup, is it generally safe to leave evals alone and manipulate only samples?

  • Suppose there is a setup clause. Then (as I understand it) calling @belapsed f(a) setup(a=rand()) evals=20 samples=100 is equivalent to drawing 100 random a values, computing f 20 times for each a value, taking the average over the 20 reps, then taking the minimum over the 100 instances. What if I want to switch the order of the average and minimum? That is, draw 100 a-values, compute f 20 times for each a value, take the minimum over the 20 reps, then take the average over the 100 instances? I am currently doing this with my own nested for loop and the vanilla @elapsed macro; it looks like this isn’t doable directly with the syntax of BenchmarkTools.

(FWIW, for actually benchmarking functions in a practical context, I realize it would be better to use the seconds and tune!() parameters. The reason I ask this question is that I need to describe my benchmarking method precisely / reproducibly in an academic paper for a non-Julian audience, using language such as “We tested our algorithm on 100 randomly generated problem instances. For each instance, we ran the algorithm 20 times and recorded the minimum as the computation time. Then, in Table 1, we report the average of the 100 minima.”)

Another example that doesn’t seem to clarify much:

using BenchmarkTools

function benchmarkcomparison()
    t1 = minimum(sum(@elapsed rand(10) for _ in 1:100) for _ in 1:20) / 100
    t2 = @belapsed rand(10) evals = 100 samples = 20
    t3 = @belapsed rand(10) evals = 20 samples = 100

    @show t1
    @show t2
    @show t3

# t1 = 1.494300000000001e-7
# t2 = 1.0133e-7
# t3 = 1.0070000000000001e-7