Apparently I am not the first one to be confused about the difference between the samples
and evals
parameters in BenchmarkTools.jl: BenchmarkTools
(I know that setup
is called newly for each of the samples
but not for the evals
, but this is not what my question is about.)
I am interested primarily in the @belapsed
macro, which according to the documentation is a minimum over samples and/or evals:
@belapsed
returns the minimum time in seconds.
The documentation also says:
samples
: The number of samples to take. Execution will end if this many samples have been collected. Defaults toBenchmarkTools.DEFAULT_PARAMETERS.samples = 10000
.- …
evals
: The number of evaluations per sample. For best results, this should be kept consistent between trials. A good guess for this value can be automatically set on a benchmark viatune!
, but usingtune!
can be less consistent than settingevals
manually. Defaults toBenchmarkTools.DEFAULT_PARAMETERS.evals = 1
.
In the thread linked above, @baggepinnen quoted the following explanation (not sure of the source):
The reasoning behind our definition of “sample” may not be obvious to all readers. If the time to execute a benchmark is smaller than the resolution of your timing method, then a single evaluation of the benchmark will generally not produce a valid sample. In that case, one must approximate a valid sample by recording the total time
t
it takes to recordn
evaluations, and estimating the sample’s time per evaluation ast/n
.
So, if I am reading this correctly, then calling @belapsed f() evals=20 samples=100
will do something like
times = zeros(20)
for i in 1:20
times[i] = sum(@elapsed f() for j in 1:100) / 100
end
return minimum(times)
OTOH, the following REPL session seems to suggest the opposite—that we take the average over evals
, then the minimum over samples
:
julia> using BenchmarkTools
julia> @belapsed rand(10) evals=200000 samples=1
1.4441e-7
julia> @belapsed rand(10) evals=1 samples=200000
1.19e-7
Indeed, this seems to be the only interpretation that is consistent with the definition of @belapsed
as a minimum time, since the default value of evals
is one.
Questions:
-
Is it correct to say that
@belapsed
takes the average overevals
, then the minimum oversamples
? -
If I am trying to benchmark a function that doesn’t require any setup, is it generally safe to leave
evals
alone and manipulate onlysamples
? -
Suppose there is a setup clause. Then (as I understand it) calling
@belapsed f(a) setup(a=rand()) evals=20 samples=100
is equivalent to drawing 100 randoma
values, computingf
20 times for eacha
value, taking the average over the 20 reps, then taking the minimum over the 100 instances. What if I want to switch the order of the average and minimum? That is, draw 100a
-values, computef
20 times for eacha
value, take the minimum over the 20 reps, then take the average over the 100 instances? I am currently doing this with my own nested for loop and the vanilla@elapsed
macro; it looks like this isn’t doable directly with the syntax of BenchmarkTools.
(FWIW, for actually benchmarking functions in a practical context, I realize it would be better to use the seconds
and tune!()
parameters. The reason I ask this question is that I need to describe my benchmarking method precisely / reproducibly in an academic paper for a non-Julian audience, using language such as “We tested our algorithm on 100 randomly generated problem instances. For each instance, we ran the algorithm 20 times and recorded the minimum as the computation time. Then, in Table 1, we report the average of the 100 minima.”)
Another example that doesn’t seem to clarify much:
using BenchmarkTools
function benchmarkcomparison()
t1 = minimum(sum(@elapsed rand(10) for _ in 1:100) for _ in 1:20) / 100
t2 = @belapsed rand(10) evals = 100 samples = 20
t3 = @belapsed rand(10) evals = 20 samples = 100
@show t1
@show t2
@show t3
end
benchmarkcomparison()
# t1 = 1.494300000000001e-7
# t2 = 1.0133e-7
# t3 = 1.0070000000000001e-7