The difference between samples and evals has to do with how the measurement is made. For statistical purposes, many measurements must be made (and saved), these are the samples.
Sometimes however, a function executes so quickly that running it a single time gives an inaccurate sample. This is where evals/sample comes in. Each sample will actually consists of timing the expression multiple times in a loop (and then dividing by the number of times). Roughly, the execution of @benchmark for n_samples and n_eval looks like
samples = zeros(n_samples)
for i in 1:n_samples
t = @elapsed for j in 1:n_evals
run_your_code()
end
samples[i] = t/n_evals
end
Also note that if you use a setup parameter, e.g. to randomly choose inputs, like so: @benchmark sin(x) setup=(x=rand()), then each sample will re-execute the setup block, therefore using a different random number.
Thanks. Yes, I have read most of the manual. I was wondering whethe you tried setting evals to one? Your example has evals set to 1000. As the benchmarks take longer, evals can safely decrease. I am also playing with the maximum time allowed.
From reading the manual, it does seem like setting the default evals to 1 should have worked. It might be worth opening an issue in BenchmarkTools to either fix the behavior of the default evals or clarify the manual.
In the meantime, you can always do @benchmark sin(3.0) evals=1 to force the number of evals to actually be 1.
b = @benchmarkable sin(3.) samples=15 evals=10
run(b)
In that way, I can set up multiple benchmarks with different parameter and could even do so programmatically if I had many of them. I would feel comfortable with the parameter choices if the median and mean are with 10 percent of each other. The more work per evaluation, the less evaluations are required. These tools are growing on me.