BenchmarkTools.jl: @benchmark has big overhead


I want to use BenchmarkTools to compare different model builds using JuMP for linear programming. Doing this for bigger linear problems took longer than I imagined. When looking at the results of @benchmark I noticed that the time to run the code I want to benchmark did just account for a small part of the time the whole benchmarking process took.

Here is the code I used. The function bench_array() is what I want to benchmark and the function benchmark_process_A() guides the benchmarking process.

module b_Array

export benchmark_process_A

using JuMP, CPLEX, BenchmarkTools, JLD, Dates

BenchmarkTools.DEFAULT_PARAMETERS.seconds = 600
BenchmarkTools.DEFAULT_PARAMETERS.samples = 5

function bench_array(A::Array{Float64,2},b::Array{Float64,1},n::Int64)
    model = Model()
    @variable(model, 0 <= x[1:n] <= 1)
    @constraint(model, con[i = 1:n], sum(A[i, j] * x[j] for j = 1:n) <= b[i])
    @objective(model, Max, sum(x))
    return model

function benchmark_process_A()

    n = 10000
    A = rand(n,n)
    b = rand(n)

    println("Benchmarking Array started")
    println(Dates.format(now(), "HH:MM:SS"))
    benchmark = @benchmark bench_array($A,$b,$n)
    println("Benchmarking Array finished")
    println(Dates.format(now(), "HH:MM:SS"))

    return benchmark


When I used benchmark_process_A() in the REPL I got the following results:

julia> include("b_Array.jl")

julia> using .b_Array

julia> benchmark = benchmark_process_A()
Benchmarking Array started
Benchmarking Array finished
  memory estimate:  11.15 GiB
  allocs estimate:  600881
  minimum time:     64.891 s (7.13% GC)
  median time:      94.437 s (17.59% GC)
  mean time:        94.904 s (16.09% GC)
  maximum time:     136.016 s (17.68% GC)
  samples:          5
  evals/sample:     1

I also looked at the times for each sample and got that in total the evaluation of all samples has taken 7.909 m. While the manual time measurenment 13:36:49 - 14:04:24 gives around 27.360 m.

Now I wonder what is happening in the remaining 20 minutes. Are there any problems with the way I wrote the code? I tried to encapsulate everything in functions and used interpolation when using @benchmark. For any help I would be very thankful.

@benchmark runs code multiple times to get more accurate statistics.


I would recommend using BenchmarkTools for code that runs very fast and, therefore, are much more affected by noise. If you code takes about one minute you can just run it in a loop, and discard the first iteration (this assumes the different inputs you are passing do not change which internal functions are being called, so all of them are compiled in the first run).


This is also how I understand it. But if I reed the output correct there is one Trial with five samples and one evalutaion per sample. So in total the code runs 5 times at maximum time 136.016s. Using this estimate I get 5*136.016 s ~= 11.34 m. While the manual time measurement results in ~ 27.36m. And now I wonder what happens in the remaining time.

Do I reed the output wrong and the code is run more often or what exactly do you mean?

Thanks for your reply. Do you know if there is also an easy way to measure allocations using a loop?

@benchmark runs your code at least once before the benchmark in order to avoid measuring compilation time, and I think it may run it a few more times in order to estimate the correct number of evaluations and samples per evaluation. I suspect those initial evaluations are not counted in the benchmark results and may account for the missing time.


I suggest you looking to the documentation of @allocated and @timed.