Apparently I am not the first one to be confused about the difference between the samples and evals parameters in BenchmarkTools.jl: BenchmarkTools
(I know that setup is called newly for each of the samples but not for the evals, but this is not what my question is about.)
I am interested primarily in the @belapsed macro, which according to the documentation is a minimum over samples and/or evals:
@belapsedreturns the minimum time in seconds.
The documentation also says:
samples: The number of samples to take. Execution will end if this many samples have been collected. Defaults toBenchmarkTools.DEFAULT_PARAMETERS.samples = 10000.- …
evals: The number of evaluations per sample. For best results, this should be kept consistent between trials. A good guess for this value can be automatically set on a benchmark viatune!, but usingtune!can be less consistent than settingevalsmanually. Defaults toBenchmarkTools.DEFAULT_PARAMETERS.evals = 1.
In the thread linked above, @baggepinnen quoted the following explanation (not sure of the source):
The reasoning behind our definition of “sample” may not be obvious to all readers. If the time to execute a benchmark is smaller than the resolution of your timing method, then a single evaluation of the benchmark will generally not produce a valid sample. In that case, one must approximate a valid sample by recording the total time
tit takes to recordnevaluations, and estimating the sample’s time per evaluation ast/n.
So, if I am reading this correctly, then calling @belapsed f() evals=20 samples=100 will do something like
times = zeros(20)
for i in 1:20
times[i] = sum(@elapsed f() for j in 1:100) / 100
end
return minimum(times)
OTOH, the following REPL session seems to suggest the opposite—that we take the average over evals, then the minimum over samples:
julia> using BenchmarkTools
julia> @belapsed rand(10) evals=200000 samples=1
1.4441e-7
julia> @belapsed rand(10) evals=1 samples=200000
1.19e-7
Indeed, this seems to be the only interpretation that is consistent with the definition of @belapsed as a minimum time, since the default value of evals is one.
Questions:
-
Is it correct to say that
@belapsedtakes the average overevals, then the minimum oversamples? -
If I am trying to benchmark a function that doesn’t require any setup, is it generally safe to leave
evalsalone and manipulate onlysamples? -
Suppose there is a setup clause. Then (as I understand it) calling
@belapsed f(a) setup(a=rand()) evals=20 samples=100is equivalent to drawing 100 randomavalues, computingf20 times for eachavalue, taking the average over the 20 reps, then taking the minimum over the 100 instances. What if I want to switch the order of the average and minimum? That is, draw 100a-values, computef20 times for eachavalue, take the minimum over the 20 reps, then take the average over the 100 instances? I am currently doing this with my own nested for loop and the vanilla@elapsedmacro; it looks like this isn’t doable directly with the syntax of BenchmarkTools.
(FWIW, for actually benchmarking functions in a practical context, I realize it would be better to use the seconds and tune!() parameters. The reason I ask this question is that I need to describe my benchmarking method precisely / reproducibly in an academic paper for a non-Julian audience, using language such as “We tested our algorithm on 100 randomly generated problem instances. For each instance, we ran the algorithm 20 times and recorded the minimum as the computation time. Then, in Table 1, we report the average of the 100 minima.”)
Another example that doesn’t seem to clarify much:
using BenchmarkTools
function benchmarkcomparison()
t1 = minimum(sum(@elapsed rand(10) for _ in 1:100) for _ in 1:20) / 100
t2 = @belapsed rand(10) evals = 100 samples = 20
t3 = @belapsed rand(10) evals = 20 samples = 100
@show t1
@show t2
@show t3
end
benchmarkcomparison()
# t1 = 1.494300000000001e-7
# t2 = 1.0133e-7
# t3 = 1.0070000000000001e-7