Thanks, I will be using this. Since the functionality is identical and the only difference appears to be a strictly superior performance, why was this released as a new package instead of a new version of BenchmarkTools.jl?
This is great news, I look forward to trying it out. The tuning time in BenchmarkTools just kills me and I’ve found the system for saving pre-tunes fragile and fiddly.
Have you written about, or could you explain a bit, how Chairmarks achieves consistent benchmarks without all the tuning?
Other question: you say that RegressionTests.jl is still unstable, what are the odds that future breaking releases will be able to read old formats? The thought of having to modify code doesn’t bother me at all but losing access to the old benchmarks, that I would care more about.
why was this released as a new package instead of a new version of BenchmarkTools.jl?
A few reasons
1
The API is slightly different (e.g. Chairmarks runs benchmarks in the scope of the calling code while BenchmarkTools runs it in global scope. The example in the OP would error with “ERROR: UndefVarError: n not defined in Main” using BenchmarkTools)
Consequently, this would require a breaking release. Because BenchmarkTools remains a viable and useful package, and in widespread use, and especially because it is perfectly reasonable for folks to simultaneously use Chairmarks and BenchmarkTools, I think the names Chairmarks 1.x and BenchmarkTools 1.x make more sense then the names BenchmarkTools 2.x and BenchmarkTools 1.x.
2
Chairmarks is not (yet) a strict improvement over BenchmarkTools because it does not have stable support for regression testing, and because it lacks the extreme level of real world testing and therefore trustworthiness that BenchmarkTools has developed. I think a 2.0 should be clearly or strictly better than a 1.0, and I wanted to release Charimarks before getting to that point.
3
I’m lazy. BenchmarkTools has 1568 lines of code in src and Chairmarks has 384. I don’t want to develop a larger codebase when I can work with a smaller one instead. Given that I am starting from scratch (code wise, not design wise), I think it makes sense to reflect that in the package name.
Doing @b x f(_) is fine for a single global x, but it looks a little painful to do @b (x,y,z) f(_[1], 1 + g(_[2]), h(_[3])) for multiple globals, or always having to define a function explicitly … am I missing an easier syntax?
What does “interpolation in globals” mean? I heard this term often, but never understood what it means.
For me interpolation means, if you have a value of y=1 for x=0 and y=2 at x=1, you can interpolate and get y=1.5 at x=0.5… But I have the impression that in this context interpolation has a completely different meaning…
That is the most generally understood meaning of interpolation.
String interpolation and interpolation into an expression are about substituting something specific (realized, available) for a placeholder. Usually, placeholder is indicated by prefixing ‘$’.
package = "Chairmarks"
author = "Lilith Hafner"
about = "$(package) by $(author)"
about == "Chairmarks by Lilith Hafner"
When timing small, fast functions with BenchmarkTools, the results frequently are more accurate when the args are interpolated. This tells the tool to keep the values as constants.
using BenchmarkTools
x = 2.0
y = 17
@btime x^y
# 21.944 ns (1 allocation: 16 bytes)
# 262144.0
@btime $x^$y
# 7.600 ns (0 allocations: 0 bytes)
# 262144.0
It would, but it would be even more convenient to support interpolation directly in @b, since it’s such an incredibly common need when benchmarking.
This is a very useful — and widely used — feature of @btime from BenchmarkTools.jl — why not just copy it? (Issue Chairmarks#62.)
It means that the value of the global is inserted directly into the abstract syntax tree, so that the benchmark code doesn’t refer to that value via the name of the global anymore.Referring to a global variable x via the symbol/binding x is generally type unstable — the compiler assumes that x can be changed to a different value with a different type at any time, so benchmarking code that uses the binding x is artificially slow (this is right at the top of the performance tips).
If you do @btime f($x), then x is evaluated to its value when the macro is expanded — so by the time the compiler sees it, it is using the value directly and not the name x anymore, and the value is type-stable.
For example, if x = 3 then @btime f($x) is equivalent to @btime f(3).
However, the simplicity of that example may be deceptive — $x is pasting the value into the abstract syntax tree after parsing, which is not the same as pasting the value into the code as a string before parsing. If x = [1,2,3], then @btime f($x) is not the same as string interpolation @btime f([1,2,3]), which would allocate a new array [1,2,3] each time f is called during the benchmark loop. Instead, it uses the same array, given by the value of x when the macro is expanded, in every call (almost as if you did let x=x; @btime f(x); end to assign x to a temporary local variable during the benchmark).
This is super neat. Thanks for packaging it up and making it so pleasant and easy to use! My biggest pain point with BenchmarkTools is that @ballocated in particular feels way slower than it needs to be. This speedup sure is nice for some test suites I have that do maybe 20-30 checks for functions to be non-allocating:
julia> using Chairmarks, BenchmarkTools
julia> f() = rand(10)
julia> @time ((@b f()).bytes) # have run before in this REPL
0.142958 seconds (828.68 k allocations: 112.490 MiB, 2.93% gc time, 33.72% compilation time)
144.0
julia> @time (@ballocated f()) # also have run before
2.046662 seconds (9.91 M allocations: 1.319 GiB, 44.94% gc time, 1.41% compilation time)
144
Can you explain why this is so much faster than benchmarktools?
Clever algorithms? Different tradeoffs? Clever implementation tricks? Or is benchmarktools just overly conservative with respect to eval numbers?
I took a short look at the code, and both are just using time_ns(), with the obvious limitations (jitter, ccall-forced register spill, bias/warmup effects).
So RDTSC inline-assembly or performance counter magic is not your trick.
Though it seems better to either use something like AllocChecks or just run the function x times and see if any allocations occur as @ballocated is going to waste time trying to produce accurate timing results and only returns the allocations from a single run.
I just wrote a little something here, the gist of it is that tuning is not that hard. All we need to do is figure out about how many times we can run the function in 30 microseconds. And we only even need to do that if evals is not specified.
That said, it’s not perfectly consistent. Especially at very low runtime budgets, sometimes it will report spurious results.
you say that RegressionTests.jl is still unstable, what are the odds that future breaking releases will be able to read old formats?
0%. RegressionTests.jl does not (yet) save results. In order to maximize precision and minimize false positives, each comparison run is a randomized controlled trial. If you save results that were run on an idle system and then compare to results with slack running in the background, it’s plausible you’ll see a false positive. I try very hard to avoid false positives in RegressionTests.jl, so it currently does not support that sort of storage and retrieval. If and when I add performance data storage, I’ll try to ensure that compatibility, though.
Looks really cool! I wonder if the local-scope design + speed could make it feasible for use within meta-algorithms. E.g. within some optimize(f, bounds) entrypoint, one could benchmark f and dispatch to a different algorithm depending on whether or not f is “fast” or “slow” relative to some overall time budget for the optimization procedure.
edit: to that end, it might be nice to expose a function-based API (like the internal benchmark function) rather than only a macro-based one.
julia> function _sort!(x::AbstractVector; by)
isempty(x) && return x
if (@b rand(x) by seconds = 1e-4).time > 1e-6
permute!(x, sortperm(by.(x)))
else
sort!(x; by)
end
x
end
[ Info: Loading Chairmarks ...
_sort! (generic function with 1 method)
julia> @b sort!(rand(100_000), by=sqrt)
6.049 ms (6 allocs: 1.526 MiB)
julia> @b _sort!(rand(100_000), by=sqrt)
6.226 ms (60 allocs: 1.532 MiB)
julia> @b sort!(rand(100_000), by=x -> sum(sqrt(x+i) for i in 1:1000))
3.344 s (6 allocs: 1.526 MiB, without a warmup)
julia> @b _sort!(rand(100_000), by=x -> sum(sqrt(x+i) for i in 1:1000))
85.936 ms (70 allocs: 3.821 MiB)
I’d be open to adding a function API in addition if someone has a usecase where it’s necessary/helpful,
Thank you @Lilith for this amazing contribution!
Important fact about BenchmarkTools.jl: from what I can tell, no one is actively maintaining it besides me, and I wanna step down too. I got the package handed to me about a year ago, and I have overseen two important releases, but I know too little about metaprogramming, and I don’t really have the time anyway.
Thus, if a new option emerges from an enthusiastic and trusted developer, perhaps it would make sense as a community to reflect on what we want to be our default from now on? It would also help to untangle the web of CI benchmarking utilities (PkgBenchmark.jl, BenchmarkCI.jl, AirSpeedVelocity.jl, PkgJogger.jl and I might be missing some).
Appreciate your initiative and transparency @gdalle. Stepping down is as important as stepping up. The community needs to know which packages are actively maintained.
Perhaps you should submit a patch release in BenchmarkTools.jl alerting users about the lack of maintainers. You can even point to Chairmarks.jl if that is the best approach forward.
Take a look into Formatting.jl for a deprecation example: