Well, at least for coverage it’s quite usual to lower the optimisation level. I did coverage tests in C++ projects and I also had to prevent inlining and all kinds of other optimisations to get accurate coverage.
All this should of course be done in a separate CI job and with its own compilation routine.
So yes, I fully agree, for performance tests it’s non-sense to lower the optimisation level
We created a macro @bench that can be placed in front of our existing @testset macros. Then we made it possible to run our test suite in 2 modes: no benchmarking, in which case @bench does nothing, and with benchmarking, in which case @bench runs a benchmark of the code within the @testset using BenchmarkTools with configurable settings.
For every commit, we simultaneously start one build without benchmarking (just tests, this finishes fairly quickly), and one build with benchmarking (takes longer; ~20 minutes). This way, we get feedback as soon as possible if any test fails, and in ~20 minutes we can ensure that CPU time and memory did not regress. If there’s ever a regression, we know the exact commit that it occurred, and from within the benchmark report, we can click to view the commit diff.
It should be noted that we are focusing on major regressions here, typically resulting from bugs or type instability introduced during feature work or refactoring.
By using this @bench macro, we make it trivial to enable benchmarking (no need to write and maintain separate benchmarking scripts), and get very good benchmarking coverage (pretty much all tests are also benchmarked). We can also reuse the same hieararchy and structure that we already have for our @testsets.
Sample screenshot below of our benchmark report. The test hierarchy is on the left, and each commit creates a new column (split into two cells for CPU / Memory) appended on the right. You can see below how four tests regressed in the second to last commit (red cells; CPU and memory grew by several hundred percent). This was a bug that one of our developers introduced that might otherwise have gone undetected.
Unfortunately, the license of the package is GPL, which means that other package authors will not be able to use the @bench macro without changing the license of their package to the GPL.
Hi Dilum, I’ve created a new repository with just the benchmarking utilities (and a trivial test application that can be used for testing) under the MIT license here:
Well, I noticed (on my very old laptop) that lower optimization level (-O1 if I recall) can be much faster. If your test scripts are only checking for correctness (or at least not resulting from compiler bugs) and/or if your test data is small, then I would consider this.