BenchmarkTools

I am trying to use BenchmarkTools in the properway. Consider the following, executed in the REPL:

using BenchmarkTools

BenchmarkTools.DEFAULT_PARAMETERS.samples = 1
BenchmarkTools.DEFAULT_PARAMETERS.evals = 1
#BenchmarkTools.DEFAULT_PARAMETERS.seconds = 2.50

@benchmark sin(3.)

with results:

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.325 ns (0.00% GC)
  median time:      2.325 ns (0.00% GC)
  mean time:        2.325 ns (0.00% GC)
  maximum time:     2.325 ns (0.00% GC)
  --------------
  samples:          1
  evals/sample:     1000

Why are there 1000 evals per sample? I set the number to evals to 1. I do not understand.

Also, I do not understand the difference between a sample and an evaluation. Any insight is appreciated. Thanks.

1 Like

Did you look at the parameter explanation in the manual? The default value of evals is already one.

1 Like

I am not sure about why the parameters are not respected, but I can answer you other question.

I am assuming you have read the manual: https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md

The difference between samples and evals has to do with how the measurement is made. For statistical purposes, many measurements must be made (and saved), these are the samples.

Sometimes however, a function executes so quickly that running it a single time gives an inaccurate sample. This is where evals/sample comes in. Each sample will actually consists of timing the expression multiple times in a loop (and then dividing by the number of times). Roughly, the execution of @benchmark for n_samples and n_eval looks like

samples = zeros(n_samples)
for i in 1:n_samples
    t = @elapsed for j in 1:n_evals
        run_your_code()
    end
    samples[i] = t/n_evals
end

Also note that if you use a setup parameter, e.g. to randomly choose inputs, like so: @benchmark sin(x) setup=(x=rand()), then each sample will re-execute the setup block, therefore using a different random number.

As an aside, please note this section of the manual when benchmarking simple functions: https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md#understanding-compiler-optimizations

For sin, see the difference (using setup instead of dereferencing):

julia> @benchmark sin(x) setup=(x=3)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     12.299 ns (0.00% GC)
  median time:      12.399 ns (0.00% GC)
  mean time:        12.632 ns (0.00% GC)
  maximum time:     43.801 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark sin(3)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.999 ns (0.00% GC)
  median time:      1.100 ns (0.00% GC)
  mean time:        1.093 ns (0.00% GC)
  maximum time:     18.701 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
2 Likes

Thanks. Yes, I have read most of the manual. I was wondering whethe you tried setting evals to one? Your example has evals set to 1000. As the benchmarks take longer, evals can safely decrease. I am also playing with the maximum time allowed.

From reading the manual, it does seem like setting the default evals to 1 should have worked. It might be worth opening an issue in BenchmarkTools to either fix the behavior of the default evals or clarify the manual.

In the meantime, you can always do @benchmark sin(3.0) evals=1 to force the number of evals to actually be 1.

1 Like

I’ve had this problem before, too and just ignored it. Thank you for bringing this up! Hopefully it can result in a bugfix or a better docstring.

1 Like

I just tried @benchmarkable. For example:

b = @benchmarkable sin(3.) samples=15  evals=10
run(b)

In that way, I can set up multiple benchmarks with different parameter and could even do so programmatically if I had many of them. I would feel comfortable with the parameter choices if the median and mean are with 10 percent of each other. The more work per evaluation, the less evaluations are required. These tools are growing on me.

2 Likes