Chairmarks.jl

Lilith · March 3, 2024, 4:16pm

Hello folks!

I’m announcing Chairmarks.jl, version 1.0 (docs).

It’s a benchmarking package that aims to be hundreds of times faster than BenchmarkTools without compromising on precision.

Usage is pretty similar to BenchmarkTools (see differences), but enjoy the performance and usability improvements!

(@v1.11) pkg> add Chairmarks

julia> using Chairmarks
Precompiling Chairmarks
  1 dependency successfully precompiled in 1 seconds. 2 already precompiled.

julia> @b rand(1000)
7.584 μs (3 allocs: 7.875 KiB)

julia> f(n) = @b rand(n) seconds=.001
f (generic function with 1 method)

julia> @time f.(1:1000)
  1.099479 seconds (8.34 M allocations: 7.109 GiB, 16.47% gc time, 1.60% compilation time)
1000-element Vector{Chairmarks.Sample}:
 13.333 ns (2 allocs: 64 bytes)
 13.828 ns (2 allocs: 80 bytes)
 15.375 ns (2 allocs: 80 bytes)
 16.171 ns (2 allocs: 96 bytes)
 ⋮
 708.320 ns (3 allocs: 7.875 KiB)
 711.640 ns (3 allocs: 7.875 KiB)
 701.923 ns (3 allocs: 7.875 KiB)

Datseris · March 3, 2024, 5:06pm

Thanks, I will be using this. Since the functionality is identical and the only difference appears to be a strictly superior performance, why was this released as a new package instead of a new version of BenchmarkTools.jl?

mnemnion · March 3, 2024, 8:57pm

This is great news, I look forward to trying it out. The tuning time in BenchmarkTools just kills me and I’ve found the system for saving pre-tunes fragile and fiddly.

Have you written about, or could you explain a bit, how Chairmarks achieves consistent benchmarks without all the tuning?

Other question: you say that RegressionTests.jl is still unstable, what are the odds that future breaking releases will be able to read old formats? The thought of having to modify code doesn’t bother me at all but losing access to the old benchmarks, that I would care more about.

Lilith · March 3, 2024, 9:07pm

why was this released as a new package instead of a new version of BenchmarkTools.jl?

A few reasons

1

The API is slightly different (e.g. Chairmarks runs benchmarks in the scope of the calling code while BenchmarkTools runs it in global scope. The example in the OP would error with “ERROR: UndefVarError: n not defined in Main” using BenchmarkTools)

Consequently, this would require a breaking release. Because BenchmarkTools remains a viable and useful package, and in widespread use, and especially because it is perfectly reasonable for folks to simultaneously use Chairmarks and BenchmarkTools, I think the names Chairmarks 1.x and BenchmarkTools 1.x make more sense then the names BenchmarkTools 2.x and BenchmarkTools 1.x.

2

Chairmarks is not (yet) a strict improvement over BenchmarkTools because it does not have stable support for regression testing, and because it lacks the extreme level of real world testing and therefore trustworthiness that BenchmarkTools has developed. I think a 2.0 should be clearly or strictly better than a 1.0, and I wanted to release Charimarks before getting to that point.

3

I’m lazy. BenchmarkTools has 1568 lines of code in src and Chairmarks has 384. I don’t want to develop a larger codebase when I can work with a smaller one instead. Given that I am starting from scratch (code wise, not design wise), I think it makes sense to reflect that in the package name.

stevengj · March 3, 2024, 9:54pm

Is there a reason why you don’t support interpolation of globals, or is this a planned feature?

Doing @b x f(_) is fine for a single global x, but it looks a little painful to do @b (x,y,z) f(_[1], 1 + g(_[2]), h(_[3])) for multiple globals, or always having to define a function explicitly … am I missing an easier syntax?

Lilith · March 3, 2024, 9:59pm

If you really want interpolation, you can always do @eval @b f($x + g($y), h($z)). Would it help to document this?

ufechner7 · March 3, 2024, 10:17pm

What does “interpolation in globals” mean? I heard this term often, but never understood what it means.
For me interpolation means, if you have a value of y=1 for x=0 and y=2 at x=1, you can interpolate and get y=1.5 at x=0.5… But I have the impression that in this context interpolation has a completely different meaning…

JeffreySarnoff · March 3, 2024, 10:20pm

documenting always assits

JeffreySarnoff · March 3, 2024, 11:12pm

That is the most generally understood meaning of interpolation.
String interpolation and interpolation into an expression are about substituting something specific (realized, available) for a placeholder. Usually, placeholder is indicated by prefixing ‘$’.

package = "Chairmarks"
author = "Lilith Hafner"

about = "$(package) by $(author)"
about == "Chairmarks by Lilith Hafner"

When timing small, fast functions with BenchmarkTools, the results frequently are more accurate when the args are interpolated. This tells the tool to keep the values as constants.

using BenchmarkTools
x = 2.0
y = 17

@btime x^y
#  21.944 ns (1 allocation: 16 bytes)
#  262144.0

@btime $x^$y
#  7.600 ns (0 allocations: 0 bytes)
#  262144.0

stevengj · March 4, 2024, 1:16am

It would, but it would be even more convenient to support interpolation directly in @b, since it’s such an incredibly common need when benchmarking.

This is a very useful — and widely used — feature of @btime from BenchmarkTools.jl — why not just copy it? (Issue Chairmarks#62.)

It means that the value of the global is inserted directly into the abstract syntax tree, so that the benchmark code doesn’t refer to that value via the name of the global anymore.Referring to a global variable x via the symbol/binding x is generally type unstable — the compiler assumes that x can be changed to a different value with a different type at any time, so benchmarking code that uses the binding x is artificially slow (this is right at the top of the performance tips).

If you do @btime f($x), then x is evaluated to its value when the macro is expanded — so by the time the compiler sees it, it is using the value directly and not the name x anymore, and the value is type-stable.

For example, if x = 3 then @btime f($x) is equivalent to @btime f(3).

However, the simplicity of that example may be deceptive — $x is pasting the value into the abstract syntax tree after parsing, which is not the same as pasting the value into the code as a string before parsing. If x = [1,2,3], then @btime f($x) is not the same as string interpolation @btime f([1,2,3]), which would allocate a new array [1,2,3] each time f is called during the benchmark loop. Instead, it uses the same array, given by the value of x when the macro is expanded, in every call (almost as if you did let x=x; @btime f(x); end to assign x to a temporary local variable during the benchmark).

This is the same behavior as $ interpolation in @eval or in :(...) expressions — see the metaprogramming manual on interpolation.

cgeoga · March 4, 2024, 7:57am

This is super neat. Thanks for packaging it up and making it so pleasant and easy to use! My biggest pain point with BenchmarkTools is that @ballocated in particular feels way slower than it needs to be. This speedup sure is nice for some test suites I have that do maybe 20-30 checks for functions to be non-allocating:

julia> using Chairmarks, BenchmarkTools

julia> f() = rand(10)

julia> @time ((@b f()).bytes) # have run before in this REPL
  0.142958 seconds (828.68 k allocations: 112.490 MiB, 2.93% gc time, 33.72% compilation time)
144.0

julia> @time (@ballocated f()) # also have run before
  2.046662 seconds (9.91 M allocations: 1.319 GiB, 44.94% gc time, 1.41% compilation time)
144

Thanks again!

baggepinnen · March 4, 2024, 8:06am

Are you aware of GitHub - JuliaLang/AllocCheck.jl: AllocCheck ?
Here’s an example of where this is used to infer that a function does not allocate, without even running it once

foobar_lv2 · March 4, 2024, 9:55am

Can you explain why this is so much faster than benchmarktools?

Clever algorithms? Different tradeoffs? Clever implementation tricks? Or is benchmarktools just overly conservative with respect to eval numbers?

I took a short look at the code, and both are just using time_ns(), with the obvious limitations (jitter, ccall-forced register spill, bias/warmup effects).

So RDTSC inline-assembly or performance counter magic is not your trick.

Ahmed_Salih · March 4, 2024, 10:29am

What!

Thanks for mentioning this, was not aware

Zentrik · March 4, 2024, 1:10pm

You could make BenchmarkTools faster with something like this

julia> @time @ballocated f() gctrial=false evals=1;
  0.033119 seconds (120.40 k allocations: 5.596 MiB, 60.58% compilation time)
julia> @time @ballocated f();
  1.649802 seconds (10.27 M allocations: 1.367 GiB, 41.78% gc time, 1.22% compilation time)

Though it seems better to either use something like AllocChecks or just run the function x times and see if any allocations occur as @ballocated is going to waste time trying to produce accurate timing results and only returns the allocations from a single run.

Lilith · March 4, 2024, 2:39pm

I just wrote a little something here, the gist of it is that tuning is not that hard. All we need to do is figure out about how many times we can run the function in 30 microseconds. And we only even need to do that if evals is not specified.

That said, it’s not perfectly consistent. Especially at very low runtime budgets, sometimes it will report spurious results.

you say that RegressionTests.jl is still unstable, what are the odds that future breaking releases will be able to read old formats?

0%. RegressionTests.jl does not (yet) save results. In order to maximize precision and minimize false positives, each comparison run is a randomized controlled trial. If you save results that were run on an idle system and then compare to results with slack running in the background, it’s plausible you’ll see a false positive. I try very hard to avoid false positives in RegressionTests.jl, so it currently does not support that sort of storage and retrieval. If and when I add performance data storage, I’ll try to ensure that compatibility, though.

@stevengj

This is a very useful — and widely used — feature of @btime from BenchmarkTools.jl — why not just copy it?

Interpolation requires constructing and compiling a new expression at runtime. This makes runtime budgets faster than about 50ms impossible and causes a memory leak when running the same benchmark repeatedly in a loop (Memory leak when repeatedly benchmarking · Issue #339 · JuliaCI/BenchmarkTools.jl · GitHub).

@foobar_lv2

Can you explain why this is so much faster than benchmarktools?

Explanations · How-is-this-faster-than-BenchmarkTools?

ericphanson · March 4, 2024, 3:07pm

Looks really cool! I wonder if the local-scope design + speed could make it feasible for use within meta-algorithms. E.g. within some optimize(f, bounds) entrypoint, one could benchmark f and dispatch to a different algorithm depending on whether or not f is “fast” or “slow” relative to some overall time budget for the optimization procedure.

edit: to that end, it might be nice to expose a function-based API (like the internal benchmark function) rather than only a macro-based one.

Lilith · March 4, 2024, 3:35pm

For sure!

julia> function _sort!(x::AbstractVector; by)
           isempty(x) && return x
           if (@b rand(x) by seconds = 1e-4).time > 1e-6
               permute!(x, sortperm(by.(x)))
           else
               sort!(x; by)
           end
           x
       end
[ Info: Loading Chairmarks ...
_sort! (generic function with 1 method)

julia> @b sort!(rand(100_000), by=sqrt)
6.049 ms (6 allocs: 1.526 MiB)

julia> @b _sort!(rand(100_000), by=sqrt)
6.226 ms (60 allocs: 1.532 MiB)

julia> @b sort!(rand(100_000), by=x -> sum(sqrt(x+i) for i in 1:1000))
3.344 s (6 allocs: 1.526 MiB, without a warmup)

julia> @b _sort!(rand(100_000), by=x -> sum(sqrt(x+i) for i in 1:1000))
85.936 ms (70 allocs: 3.821 MiB)

I’d be open to adding a function API in addition if someone has a usecase where it’s necessary/helpful,

gdalle · March 4, 2024, 3:50pm

Thank you @Lilith for this amazing contribution!
Important fact about BenchmarkTools.jl: from what I can tell, no one is actively maintaining it besides me, and I wanna step down too. I got the package handed to me about a year ago, and I have overseen two important releases, but I know too little about metaprogramming, and I don’t really have the time anyway.
Thus, if a new option emerges from an enthusiastic and trusted developer, perhaps it would make sense as a community to reflect on what we want to be our default from now on? It would also help to untangle the web of CI benchmarking utilities (PkgBenchmark.jl, BenchmarkCI.jl, AirSpeedVelocity.jl, PkgJogger.jl and I might be missing some).

juliohm · March 4, 2024, 5:43pm

Appreciate your initiative and transparency @gdalle. Stepping down is as important as stepping up. The community needs to know which packages are actively maintained.

Perhaps you should submit a patch release in BenchmarkTools.jl alerting users about the lack of maintainers. You can even point to Chairmarks.jl if that is the best approach forward.

Take a look into Formatting.jl for a deprecation example:

github.com

JuliaIO/Formatting.jl/blob/0db1b043dfe6f0a2678cb48813b49565c2ef89f9/src/Formatting.jl#L11-L25


      
          if ccall(:jl_generating_output, Cint, ()) == 1
              @warn """
              DEPRECATION NOTICE
          
              Formatting.jl has been unmaintained for a while, with some serious
              correctness bugs compromising the original purpose of the package. As a result,
              it has been deprecated - consider using an alternative, such as
              `Format.jl` (https://github.com/JuliaString/Format.jl) or the `Printf` stdlib directly.
          
              If you are not using Formatting.jl as a direct dependency, please consider
              opening an issue on any packages you are using that do use it as a dependency.
              From Julia 1.9 onwards, you can query `]why Formatting` to figure out which
              package originally brings it in as a dependency.
              """
          end

Topic		Replies	Views
Difference in microbenchmark result, Chairmarks.jl vs BenchmarkTools.jl General Usage benchmark	8	540	March 20, 2024
Push! versus preallocation New to Julia	17	2784	June 11, 2020
Making @benchmark outputs statistically meaningful, and actionable Profiling benchmark , benchmarktools	123	3393	November 15, 2023
Identical functions repeated benchmarks show systematic differences Performance question , sort	37	2834	August 2, 2021
Shouldn't 1.8.0 be faster than Julia 1.7? Performance	30	2537	September 16, 2022

Chairmarks.jl

Related topics