Tracking memory usage in unit tests (a lot worse in Julia 0.7 than 0.6)

Well, I’m not sure how other developers handle this, but I for one don’t run the full testsuite on every commit. with BenchmarkTools you are able to create test suites for specific parts and just run those - you’ll need some setup though to only run the tests for the code you changed (or what’s affected by the change). Those suites should only run once too, since you don’t care about the time in those parts.

In general though, I don’t think checking for regression of everything on every commit is a good choice, especially because that really limits iteration speed on your code. Your options are probably limited to “run the tests less often” and “write granular testsuites to be run independently”. A combination of both is probably a good choice.

Another idea would be to only benchmark unit tests when you already know they’re slow/use more memory, instead of all the time.

2 Likes

You might find the readme and the solution of this project useful
https://github.com/JuliaCI/BaseBenchmarks.jl

1 Like

If the problem has a scale (eg number of observations, grid size) that leaves the algorithm invariant, you could run on a small scale to compile, then benchmark on the large one.

1 Like

Basically you want to compile your function before calling it for the first time. Calling @code_native seems to compile your function, and you can call code_native with a dummy buffer to avoid printing the output. Maybe that doesn’t always work, or there’s a better way to trigger compilation.

julia> function test()
           A = [x for x=0.0:2]
       end
test (generic function with 1 method)

julia> code_native(IOBuffer(),test,())

julia> @time test()
  0.000030 seconds (6 allocations: 352 bytes)
3-element Array{Float64,1}:
 0.0
 1.0
 2.0
2 Likes

You could use precompile for this.

https://docs.julialang.org/en/v1/base/base/#Base.precompile

3 Likes

Does it precompile recursively (ie also the functions it calls)?

I think that’s impossible in full generality without actually running the code. It may be possible to precompile recursively the calls that are resolved with static dispatch, but that might be problematic also (because it might precompile too much)

Can you test how much time is needed to run something twice? Is it 20% longer or 99% longer. Depending the result you will need different strategy: 20% case I would just live with and the second case I would look into precompiling.

I found some more stuff that might be of interest to you:

https://docs.julialang.org/en/v1/manual/profile/#Memory-allocation-analysis-1

Especially Coverage.jl, since single function memory allocation can be tested. This could be integrated into a workflow where changed code gets tested automatically for memory allocation regressions.

Thanks for all the advice! Several good suggestions here, which we’ll look further into.

I am a bit skeptical to relying on code coverage for any part of this, since I find that it’s not working very well in Julia 0.7. I just started a new topic about this.

julia -O, --optimize={0,1,2,3} Set the optimization level (default 2 if unspecified or 3 if specified as -O)

I don’t know why Julia got slowing, but assume, since there’s a tradeoff for compilation speed and optimization, that the default level is now more aggressive. Maybe lower, e.g. -01 helps?

Changing the optimization level seems like a bad idea when you want to test the performance.

2 Likes

Well, at least for coverage it’s quite usual to lower the optimisation level. I did coverage tests in C++ projects and I also had to prevent inlining and all kinds of other optimisations to get accurate coverage.
All this should of course be done in a separate CI job and with its own compilation routine.

So yes, I fully agree, for performance tests it’s non-sense to lower the optimisation level :wink:

1 Like

Does inlining affect coverage in Julia?

Is precompile not sufficient enough to trigger compilation at the using PackageName step?

An update on how we solved this:

We created a macro @bench that can be placed in front of our existing @testset macros. Then we made it possible to run our test suite in 2 modes: no benchmarking, in which case @bench does nothing, and with benchmarking, in which case @bench runs a benchmark of the code within the @testset using BenchmarkTools with configurable settings.

For every commit, we simultaneously start one build without benchmarking (just tests, this finishes fairly quickly), and one build with benchmarking (takes longer; ~20 minutes). This way, we get feedback as soon as possible if any test fails, and in ~20 minutes we can ensure that CPU time and memory did not regress. If there’s ever a regression, we know the exact commit that it occurred, and from within the benchmark report, we can click to view the commit diff.

It should be noted that we are focusing on major regressions here, typically resulting from bugs or type instability introduced during feature work or refactoring.

By using this @bench macro, we make it trivial to enable benchmarking (no need to write and maintain separate benchmarking scripts), and get very good benchmarking coverage (pretty much all tests are also benchmarked). We can also reuse the same hieararchy and structure that we already have for our @testsets.

Sample screenshot below of our benchmark report. The test hierarchy is on the left, and each commit creates a new column (split into two cells for CPU / Memory) appended on the right. You can see below how four tests regressed in the second to last commit (red cells; CPU and memory grew by several hundred percent). This was a bug that one of our developers introduced that might otherwise have gone undetected.

12 Likes

@bennedich Would you be able to publicly share the @bench macro and the code for generating the benchmark reports shown in your image?

2 Likes

I believe I found one of the packages to which @bennedich was referring: https://github.com/nep-pack/NonlinearEigenproblems.jl

Unfortunately, the license of the package is GPL, which means that other package authors will not be able to use the @bench macro without changing the license of their package to the GPL.

I’ve opened an issue (https://github.com/nep-pack/NonlinearEigenproblems.jl/issues/209) to see whether the authors of NonlinearEigenproblems.jl would be willing to release the @bench macro and their other benchmarking utility code under the MIT license.

Hi Dilum, I’ve created a new repository with just the benchmarking utilities (and a trivial test application that can be used for testing) under the MIT license here:

https://github.com/maxbennedich/julia-regression-analysis

It’s not an independent package, so you’ll probably need to adapt it a bit to your project. Let me know if you have any questions.

4 Likes

Well, I noticed (on my very old laptop) that lower optimization level (-O1 if I recall) can be much faster. If your test scripts are only checking for correctness (or at least not resulting from compiler bugs) and/or if your test data is small, then I would consider this.

1 Like