Surprising test timings

TimG · May 29, 2025, 1:53pm

When I run tests on a package I’ve been working on, the output shows (for a subset of tests):

Test Summary: | Pass  Total   Time
Styles        |  775    775  49.0s
Test Summary:       | Pass  Total  Time
Conditional Formats |  462    462  3.7s
  DataBar           |   59     59  1.5s
  colorScale        |   53     53  0.7s
  iconSet           |   94     94  0.1s
  cellIs            |   40     40  0.6s
  containsText      |   34     34  0.1s
  top10             |   34     34  0.0s
  aboveAverage      |   42     42  0.5s
  timePeriod        |   37     37  0.0s
  expression        |   34     34  0.0s
  containsErrors    |   35     35  0.0s

Now, when I compare with times on a wall clock, 49 seconds is well within a factor of 2 of the actual time taken for the 775 Styles tests. However, the 462 Conditional Formats tests take 2 minutes and 45 seconds to run according to the wall clock - 45 times longer than reported!

If I wrap these two @testset groups with @time, the reported times agree with the time reported in the test not the wall clock time.

What might give rise to the significant difference between measured and actual elapsed time? What does such a big discrepancy for Conditional Formats betide?

mbauman · May 29, 2025, 2:08pm

If I had to guess, I’d suspect the “missing” time is all compilation time. It can sometimes be tricky to ensure compilation time gets tracked appropriately, and tests are often exercising lots of new types and functions. It’s not unlikely for one testset to have lots more compilation time than another.

Depending upon how you’ve written things — and how the @testset macro itself works — it can be very easy to have Julia start compiling before it starts timing. The timing loop itself can get compiled! It’d probably be good for the @testset macro to try harder to catch this… if my guess is indeed right.

TimG · May 30, 2025, 8:04am

I learned yesterday (here):

Keyword input types are specialized for compilation (this is different from the “method specialization” where you write multiple methods with varied positional arguments)

The functions in the Conditional Formats tests make extensive use of a mixed set of keywords to control the formatting options. There are very many combinations of different keywords possible. I guess this means that almost every invocation of each function will need compiling separately. I modelled this (very loosely) on my experience with Makie.jl.

Maybe there is a more efficient way?

jules · May 31, 2025, 6:12pm

Keyword arguments in Julia are geared towards performance, while you can’t dispatch on them directly you do get specialization for each combination of types. This can be problematic if you have a ton of keyword arguments but they don’t have a big performance impact. Then you just pay for a lot of compilation.

In AlgebraOfGraphics for example, I avoid this problem by converting keyword arguments to dicts as the first step in the pipeline. That means only a tiny function gets compiled multiple times, and the big functions after that only ever see the same dict type, like Dict{Symbol,Any}. If you later do need type stability for a hot loop or so, you can just extract the relevant keyword from the dict in front of a function barrier. Then inside that inner function, you regain stability, but you only compile that localized part with different specializations instead of everything.

github.com/MakieOrg/AlgebraOfGraphics.jl

src/draw.jl

master


      
          axis = _kwdict(axis, :axis)
          figure = _kwdict(figure, :figure)
          facet = _kwdict(facet, :facet)
          legend = _kwdict(legend, :legend)
          colorbar = _kwdict(colorbar, :colorbar)

TimG · June 1, 2025, 7:06am

Thank you @jules. This is very helpful and looks like it will suit my situation well.

I have a pipeline of functions for each method. The first one chooses the desired method and then calls one or two functions resolve the function arguments into a standard form. These pre-processor functions don’t use the keyword arguments at all. In turn, they then call the main worker method, which is informed by the keywords to process the desired action.

It doesn’t sound too hard to use the first function in the pipeline to pack keywords into a Dict and have the workers either unpack them again or just to use them from the Dict directly. All but one of my keywords are strings, so I could use Dict{String, String} and keep the lone Vector{Int} out of it.

I’ll definitely give it a go!

Topic		Replies	Views
@time is off by at least 10x New to Julia question	7	816	August 13, 2020
Benchmarking with @time @btime and subsequent runs return shorter execution time New to Julia benchmark , benchmarktools	8	2136	December 31, 2020
.== performance regression New to Julia	10	1054	September 4, 2017
Why is there difference between time and @time? Performance benchmarks	8	1029	October 2, 2021
@time and @btime report different values? General Usage	17	687	May 1, 2022

Surprising test timings

Related topics