Understanding output of @time and @btime under cgroup CPUQuota constrains

Hello, I was testing cgroup CPU/ Memory quota in a Ubuntu 22.04 system and realised that while @time reacts to changes I do in terms of CPUQuota (and indeed the whole system does slow down, meaning the limit is enforced), @btime instead continue to report the same value whatever CPUQuota I set.
I suspect this is indeed intended, but can someone explain then which times the two are measuring (other than the compiling difference) ?

   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.0 (2023-05-07)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using BenchmarkTools

julia> # with CPUQuota=800%

julia> @time sum(rand() for _ in 1:10000)
  0.043508 seconds (70.31 k allocations: 4.675 MiB, 83.42% compilation time)
5025.053802143937

julia> @time sum(rand() for _ in 1:10000)
  0.034323 seconds (50.96 k allocations: 3.403 MiB, 99.45% compilation time)
5000.63699353686

julia> @time sum(rand() for _ in 1:10000)
  0.034231 seconds (50.96 k allocations: 3.424 MiB, 99.46% compilation time)
4995.095574082788

julia> @btime sum(rand() for _ in 1:10000)
  13.291 ÎĽs (0 allocations: 0 bytes)
4978.794767720662

julia> @btime sum(rand() for _ in 1:10000)
  13.293 ÎĽs (0 allocations: 0 bytes)
5012.7544011688715

julia> # with CPUQuota=20%

julia> @time sum(rand() for _ in 1:10000)
  0.207326 seconds (50.97 k allocations: 3.405 MiB, 99.90% compilation time)
4956.722886706571

julia> @time sum(rand() for _ in 1:10000)
  0.202435 seconds (50.96 k allocations: 3.406 MiB, 99.90% compilation time)
5019.472046551035

julia> @btime sum(rand() for _ in 1:10000)
  13.226 ÎĽs (0 allocations: 0 bytes)
5029.354066339357

julia> @btime sum(rand() for _ in 1:10000)
  13.234 ÎĽs (0 allocations: 0 bytes)
5030.121947525575

You’re measuring compilation time of the generator here, so that timing is not comparable to the @btime output. FWIW, both @time and @btime use wall clock timers (clock_gettime(CLOCK_MONOTONIC) resp. gettimeofday), which AFAIU should both be unaffected by CPU quotas, i.e., they will result in longer timings when your code was limited by a CPU quota.

EDIT: I was wrong, BenchmarkTools also uses time_ns() for its actual timings. The point still holds though.

No, it doesn’t :slight_smile:

As you can see from the script above while @time is indeed affected by CPUQuota (small quota, longer time), @btime reports the same value whatever the CPUQuota…

Sure it does. You’re measuring compilation time, not the execution time of your function. The output even clearly shows 99.90% compilation time… BenchmarkTools.jl executes your definition multiple times, while @time doesn’t, resulting in a new definition every time you invoke @time. Observe:

julia> @time sum(rand() for _ in 1:10000)
  0.033058 seconds (49.93 k allocations: 3.694 MiB, 99.63% compilation time)
5036.434770860053

julia> @time sum(rand() for _ in 1:10000)
  0.021470 seconds (39.13 k allocations: 2.596 MiB, 99.29% compilation time)
5034.401529381379

julia> f() = sum(rand() for _ in 1:10000)
f (generic function with 1 method)

julia> @time f()
  0.000018 seconds
5011.031523075842

julia> @time f()
  0.000020 seconds
4947.960727818074

The large difference in timing between @time and @btime due to compilation is expected by the OP, as they note:

You’ll have to compare something like

julia> f() = sum(rand() for _ in 1:10000)
f (generic function with 1 method)

julia> @time f()
  0.000007 seconds
4997.016675932721

with the output of @btime f() in order to meaningfully compare @time and @btime for this particular test, but one important aspect to note is that your CPU quota settings more or less affect how many time slices your process receives in a given interval. If there are 100 timeslices available, setting the quota to 8% means that your program will receive 8 timeslices of that interval (likewise, 800% can be interpreted as “100 of the 100 per core for 8 cores”).

In a modern OS with preemptive scheduling, the kernel regularly yanks control from a userspace program and sets another program as the currently running piece of code for one particular core. The current default smallest timeslice for userspace on linux is 100ms (see also here), which means that in order to see an actual slowdown in wall-time, you’re going to have to run a workload that runs slower than that minimum timeslice per evaluation. In some schedulers, there is no fixed minimum timeslice, but even those are much larger than the 13µs you observed.

So barring compilation due to its runtime being longer than those timeslices (which you can exclude properly as described above), I’d expect @time and @btime to be actually identical here (modulo tiny differences due to branch prediction), for this particular workload.

2 Likes

Yes, indeed I know the difference between @time and @btime in terms of compilation, and my question was why, comparing @time and @btime under no CPU contraints (CPUQuota=800%) and with CPU contraint (CPUQuota = 20%) only the first one (@time) was showing (large) differences.

I thought they measure different things, not in term of compilation yes/not, but in terms of wall-time vs CPU time, while I think I understood now it is instead simply a threshold effect, they both measure wall-time but I see the effect of the CPU restriction only in @time because this goes above this threshold… thank you for clarifying…

1 Like

There seems to be a package for that: CPUTime.jl, which exports a @CPUtime macro that should measure exactly this (it’s a bit old, but I tested it locally and it still works). Even including compilation time, that should report similar times regardless of quota.

1 Like