@time and @benchmark give different results

[Corrected: ms is 10^{-3} seconds and μs is 10^{-6}.]

I’m evaluating 2 algorithms for doing roughly the same thing, using @time and @benchmark. The ratio of times between the 2 timing methods differs by a factor of roughly 10.

Also the times reported by @time are inconsistent with the range of times reported by @benchmark. Could anyone help me understand what’s going on?

# call this invocation of fit "Method 2" since function call has 2 arguments
julia> @time r2 = RBSilly.fit(RBSilly.yx, 0.0002);
  0.043844 seconds (9.14 k allocations: 35.169 MiB, 39.28% gc time)

julia> @time r2 = RBSilly.fit(RBSilly.yx, 0.0002)
  0.033817 seconds (9.14 k allocations: 35.169 MiB)

## and call the next "Method 3"
julia> @time r = RBSilly.fit(RBSilly.yx, 0.0002, 4);
  0.011290 seconds (7.71 k allocations: 382.953 KiB)

Based on this admittedly small sample Method 2 is around 30-40 ms (milliseconds),
Method 3 is ~ 11 ms, 3 to 4 x faster than Method 2. But

# Caution: reversed the order of the 2 methods relative to @time panel
julia> @benchmark RBSilly.fit(RBSilly.yx, 0.0002, 4);

julia> @benchmark RBSilly.fit(RBSilly.yx, 0.0002, 4)
BenchmarkTools.Trial: 6673 samples with 1 evaluation.
 Range (min … max):  336.614 μs … 53.935 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     363.297 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   744.718 μs ±  1.802 ms  ┊ GC (mean ± σ):  3.36% ± 6.20%

  ███▆▆████▅▅▄▅▆▆▆▆▇▆▅▆▅▆▅▅▆▅▅▅▄▆▅▅▄▆▅▅▄▄▅▅▆▅▆▃▄▃▃▄▄▄▃▃▄▃▂▄▃▄▃ █
  337 μs        Histogram: log(frequency) by time      6.73 ms <

 Memory estimate: 348.70 KiB, allocs estimate: 7627.

julia> @benchmark RBSilly.fit(RBSilly.yx, 0.0002)
BenchmarkTools.Trial: 250 samples with 1 evaluation.
 Range (min … max):  18.043 ms … 27.407 ms  ┊ GC (min … max):  0.00% … 15.60%
 Time  (median):     20.347 ms              ┊ GC (median):    10.53%
 Time  (mean ± σ):   20.046 ms ±  1.930 ms  ┊ GC (mean ± σ):   6.35% ±  5.67%

@benchmark shows Method 2 is 20 ms and Method 3 is ~ 0.36 ms (or perhaps 0.74 if we use the mean). So Method 3 is ~ 27 - 55 x faster than Method 2. The @timer value for Method 2 of 30-40 ms is considerably above the max of 27 reported by @benchmark, while the @timer value for Method 3 of 11 ms is within @benchmark's wide range of 0.3 - 53 ms, though far from the central estimate.

BTW, Method 3 is supposed to be a shortcut that gets most of the fit with less work than Method 2. But Method 3 does require additional setup, and so it’s unclear which will be faster in practice. Both timing methods seem to agree that Method 3 is faster, though differing in how much faster.

Environment: julia 1.6.2 on 64 bit Intel, Debian GNU/Linux (julia is tarball direct from julia.org)
Running under VS Code with functions defined in a file obtained through Julia: Execute file in REPL.
I believe everything had been compiled before I ran the code excerpted above.

millisecond is 10^{-3} and microsecond is 10^{-6}.

1 Like

Thanks @Elrod for catching a silly mistake. I have reworked my original report extensively with that in mind, and things are a lot less weird. But they are still oddly inconsistent.

I suppose the differences could all be random variation, but it’s hard to believe that’s all that’s happening.


A fully reproducible example would be helpful (see Please read: make it easier to help you)

In the case of @benchmark, shouldn’t you quote the first argument?

@benchmark RBSilly.fit($RBSilly.yx, 0.0002, 4)

Several independent points:

  1. Quoting the argument doesn’t seem to affect the results much.
  2. In the very first @benchmark result I reported the max GC is 0.00, yet the mean and sd are positive!
  3. The results are consistent with the theory that @time has a fixed overhead of about 11 ms compared to @benchmark–adding 11 ms to the results from the latter gives you the former. So maybe that’s what’s going on.

I originally thought I was unlikely to see the problem in a trimmed down example, but 3. seems testable. Unfortunately my first attempt at a MRE was a little too minimal:

julia> @time 1
  0.000000 seconds

and a simple function call does not produce anything like 0.01 for duration:

julia> @time time();
  0.000001 seconds

Could be constant propagation, caching, or the GC being GC (sporadic and variable). The display of the max GC looks like a bug? (edit: likely Trial display assumes the max GC comes from the sample with the max time · Issue #248 · JuliaCI/BenchmarkTools.jl · GitHub)

Definitely not a constant overhead to @time.

I should add that @benchmark isn’t doing much more than @time itself. It’s just in a loop and in a function. You can get one step closer by doing something like:

function test() = [@timed(RBSilly.fit(RBSilly.yx, 0.0002, 4)) for _ in 1:100]

Note that this also gives you a solid mental model for what that fancy $ does:

function test_with_dollars(f, a, b, c) = [@timed(f(a, b, c)) for _ in 1:100]

Now doing test_with_dollars(RBSilly.fit, RBSilly.yx, 0.0002, 4) is comparable to @benchmark $(RBSilly.fit)($RBSilly.yx, $0.0002, $4).