Benchmarking the cost of allocation

I would like to have an educated guess of how much speed I would gain if I rewrote a rather large codebase with preallocated buffers (all the other low-hanging fruits I could think of have been harvested at this point, eg checking type stability, profiling, fixing hot loops, static arrays, etc).

Essentially, I would like to measure just the cost of allocations, separately from the rest of the computation. I can of course count allocations with BenchmarkTools and @time, but I don’t know how to translate that into performance.

Conceptually, I would like to do something like the comparison

using BenchmarkTools
f(a, b) = map((a, b) -> √abs(a) + exp(b), a, b)
f!(result, a, b) = map!((a, b) -> √abs(a) + exp(b), result, a, b)
a = randn(1000)
b = randn(1000)
result = similar(a)
@btime f($a, $b);
@btime f!($result, $a, $b);

without writing f!.

Any hints would be appreciated.

1 Like

Time spent in GC is some sort of measurement (reported by @benchmark).

Accurate prediction seems extremely hard since even if you measure the time spent allocating and running GC it is hard to predict the other effects this has on the system (like how it influences cache etc).

2 Likes

Yeah, I often find the speedup quite a lot greater in practice than what I would have guessed based on the time previously spent on gc.

1 Like

Yes, in the example above GC time is 0 for me (for f).

Expanding on this: I get

julia> @benchmark f($a, $b)
BenchmarkTools.Trial: 
  memory estimate:  8.00 KiB
  allocs estimate:  4
  --------------
  minimum time:     12.665 μs (0.00% GC)
  median time:      15.040 μs (0.00% GC)
  mean time:        15.395 μs (0.00% GC)
  maximum time:     186.778 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

I wonder why GC time is consistently zero. Should this be reported as an issue for BenchmarkTools or Julia?

BenchmarkTools runs GC itself between samples so that might take care of all the GC needed.

You could try set

BenchmarkTools.DEFAULT_PARAMETERS.gctrial = false

and see if anything changes.

Thanks, but that didn’t change anything. Neither did

BenchmarkTools.DEFAULT_PARAMETERS.samples = 1_000_000

My I understanding is that allocating dynamically is costly because

  1. of allocation (creating a new array etc)
  2. GC time

The way I understand this discussion is that currently it is conceptually difficult to measure (1), while for (2) we have facilities but I am not able to trigger them in practice.

I get

julia> @benchmark f($a, $b)
BenchmarkTools.Trial:
  memory estimate:  8.00 KiB
  allocs estimate:  4
  --------------
  minimum time:     8.092 μs (0.00% GC)
  median time:      8.289 μs (0.00% GC)
  mean time:        8.581 μs (1.87% GC)
  maximum time:     9.631 ms (99.85% GC)
  --------------
  samples:          188641
  evals/sample:     3

so for me, some GC time is shown.

Your maximum time is quite low so it just doesn’t seem that any GC was triggered when measuring.

1 Like