Benchmarking with PkgBenchmark.jl

I have been working with PkgBenchmark.jl all day so far and I have to say I am very happy with it. I went into it expecting lots of issues (because of the general vibe of this thread) but so far everything works smoothly and using it is also very simple. In fact I did not expect comparing benchmarks between different commits to be that straight forward.


I did, however, encounter a cognitive barrier with how BenchmarkTools.jl itself works (which PkgBenchmark.jl uses as backend).

I had a completely wrong intuition of how “tuning” works. In my mind I thought it tries to figure out how many samples to take (after “warming up” the function), and save that number to have consistency between different runs. Unsure where I got that idea but for some reason I didn’t question it until today. Turns out tuning just takes cares of estimating the evals/sample (which to be fair makes more sense).

Anyway, this misunderstanding caused me a bit of a headache because I struggled to get reasonable benchmark results for my problem. The reason for this was that the function I am interested in benchmarking takes around 6 seconds on first call and around 170 microseconds each subsequent call. The default and constant time budget for individual benchmarks is 5 seconds, which means that if I don’t invoke my function at least once before benchmarking, I only get one sample (the 6 seconds one).

Now the funny thing is that if I don’t have a “tune” file (which in this case I don’t even need), PkgBenchmark will create one, effectively invoking my function once before actually benchmarking it. Thus on the first commit I called it I got reasonable results. Now if I benchmarked again on a different commit (and the now existing tune file), I’d only get 1 sample of 6 seconds runtime.

Anyway, long story short, make sure to manually set the parameter for seconds for your problem if what you are interested in aren’t micro benchmarks. Then everything works nicely.

Another tip, which isn’t really visible on the PkgBenchmark readme, but apparent from looking at the code for @bench (see https://github.com/JuliaCI/PkgBenchmark.jl/blob/master/src/macros.jl#L32) is that you can pass parameters such as setup, teardown, seconds etc etc to @bench as well.


Here a sample code from my benchmark/benchmarks.jl that you won’t be able to execute because the package isn’t public, but it shows how I use the package right now. Its surely not the final version, but it works

using WaveSimulator
using PkgBenchmark

@benchgroup "simulation" ["simulate", "simulate!"] begin
    for (resource, tags) in ((CPU1(), ["CPU", "CPU1"]),
                             (CPUThreads((100,1,1)), ["CPU", "CPUThreads"]),
                             (CUDALibs(), ["GPU"]))
        @benchgroup "$(typeof(resource).name.name)" tags begin
            @bench(
                "simulate!",
                simulate!(state, backend, sim),
                setup = begin
                    wave = UniformWave{3}(fmax=2e3)
                    sim = Simulator(wave, resource=$(resource), duration=0.01)
                    domain = BoxDomain(6,8,4, gamma=0.05)
                    f0 = WaveSimulator.gauss(domain)
                    backend = WaveSimulator.backend_init(sim.resource, domain, sim)
                    state   = WaveSimulator.state_init(f0, backend, domain, sim)
                end,
                teardown = begin
                    backend = nothing
                    state = nothing
                    gc()
                end,
                seconds = 20,
                samples = 100
            )
        end
    end
end

To store a result for your commit on the current machine just call

julia> using PkgBenchmark

julia> res = benchmarkpkg("WaveSimulator"); showall(res)
INFO: Running benchmarks...
Creating benchmark tuning file /home/csto/.julia/v0.6/.benchmarks/WaveSimulator/.tune.jld
File results of this run? (commit=c0be5c, resultsdir=/home/csto/.julia/v0.6/.benchmarks/WaveSimulator/results) (Y/n) y
INFO: Results of the benchmark were written to /home/csto/.julia/v0.6/.benchmarks/WaveSimulator/results/c0be5c6045d034316011623cb395ffccb18b8a08.jld
1-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "simulation" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: ["simulate", "simulate!"]
          "CUDALibs" => 1-element BenchmarkTools.BenchmarkGroup:
                  tags: ["GPU"]
                  "simulate!" => Trial(168.666 μs)
          "CPUThreads" => 1-element BenchmarkTools.BenchmarkGroup:
                  tags: ["CPU", "CPUThreads"]
                  "simulate!" => Trial(38.190 ms)
          "CPU1" => 1-element BenchmarkTools.BenchmarkGroup:
                  tags: ["CPU", "CPU1"]
                  "simulate!" => Trial(180.768 ms)

Now make changes to your package. It is then quite simple compare the current state with some given commit

julia> using PkgBenchmark

julia> cmp = judge("WaveSimulator", "c0be5c6")
INFO: Running benchmarks...
Using benchmark tuning data in /home/csto/.julia/v0.6/.benchmarks/WaveSimulator/.tune.jld
WARNING: /home/csto/.julia/v0.6/WaveSimulator is dirty, not attempting to file results...
INFO: Reading results for c0be5c from /home/csto/.julia/v0.6/.benchmarks/WaveSimulator/results
1-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "simulation" => 3-element BenchmarkGroup(["simulate", "simulate!"])

julia> showall(cmp)
1-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "simulation" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: ["simulate", "simulate!"]
          "CUDALibs" => 1-element BenchmarkTools.BenchmarkGroup:
                  tags: ["GPU"]
                  "simulate!" => TrialJudgement(+3.52% => invariant)
          "CPUThreads" => 1-element BenchmarkTools.BenchmarkGroup:
                  tags: ["CPU", "CPUThreads"]
                  "simulate!" => TrialJudgement(-0.17% => invariant)
          "CPU1" => 1-element BenchmarkTools.BenchmarkGroup:
                  tags: ["CPU", "CPU1"]
                  "simulate!" => TrialJudgement(+1.16% => invariant)
8 Likes