Benchmarking the cost of allocation

Tamas_Papp · October 29, 2019, 9:42am

I would like to have an educated guess of how much speed I would gain if I rewrote a rather large codebase with preallocated buffers (all the other low-hanging fruits I could think of have been harvested at this point, eg checking type stability, profiling, fixing hot loops, static arrays, etc).

Essentially, I would like to measure just the cost of allocations, separately from the rest of the computation. I can of course count allocations with BenchmarkTools and @time, but I don’t know how to translate that into performance.

Conceptually, I would like to do something like the comparison

using BenchmarkTools
f(a, b) = map((a, b) -> √abs(a) + exp(b), a, b)
f!(result, a, b) = map!((a, b) -> √abs(a) + exp(b), result, a, b)
a = randn(1000)
b = randn(1000)
result = similar(a)
@btime f($a, $b);
@btime f!($result, $a, $b);

without writing f!.

Any hints would be appreciated.

kristoffer.carlsson · October 29, 2019, 9:52am

Time spent in GC is some sort of measurement (reported by @benchmark).

Accurate prediction seems extremely hard since even if you measure the time spent allocating and running GC it is hard to predict the other effects this has on the system (like how it influences cache etc).

baggepinnen · October 29, 2019, 10:53am

Yeah, I often find the speedup quite a lot greater in practice than what I would have guessed based on the time previously spent on gc.

Tamas_Papp · October 29, 2019, 11:08am

Yes, in the example above GC time is 0 for me (for f).

Tamas_Papp · October 29, 2019, 11:52am

Expanding on this: I get

julia> @benchmark f($a, $b)
BenchmarkTools.Trial: 
  memory estimate:  8.00 KiB
  allocs estimate:  4
  --------------
  minimum time:     12.665 μs (0.00% GC)
  median time:      15.040 μs (0.00% GC)
  mean time:        15.395 μs (0.00% GC)
  maximum time:     186.778 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

I wonder why GC time is consistently zero. Should this be reported as an issue for BenchmarkTools or Julia?

kristoffer.carlsson · October 29, 2019, 12:09pm

BenchmarkTools runs GC itself between samples so that might take care of all the GC needed.

You could try set

BenchmarkTools.DEFAULT_PARAMETERS.gctrial = false

and see if anything changes.

Tamas_Papp · October 29, 2019, 12:17pm

Thanks, but that didn’t change anything. Neither did

BenchmarkTools.DEFAULT_PARAMETERS.samples = 1_000_000

My I understanding is that allocating dynamically is costly because

of allocation (creating a new array etc)
GC time

The way I understand this discussion is that currently it is conceptually difficult to measure (1), while for (2) we have facilities but I am not able to trigger them in practice.

kristoffer.carlsson · October 29, 2019, 12:21pm

I get

julia> @benchmark f($a, $b)
BenchmarkTools.Trial:
  memory estimate:  8.00 KiB
  allocs estimate:  4
  --------------
  minimum time:     8.092 μs (0.00% GC)
  median time:      8.289 μs (0.00% GC)
  mean time:        8.581 μs (1.87% GC)
  maximum time:     9.631 ms (99.85% GC)
  --------------
  samples:          188641
  evals/sample:     3

so for me, some GC time is shown.

Your maximum time is quite low so it just doesn’t seem that any GC was triggered when measuring.

Topic		Replies	Views
Benchmarking questions General Usage benchmark	3	327	September 18, 2023
Memory allocation and @profile Performance	5	1632	October 8, 2020
Memory allocation and usage of dot notation Performance question	6	394	October 19, 2022
Allocations in function timing Performance	2	554	October 9, 2018
Allocations (again...) Performance question , allocations	15	671	November 25, 2022

Benchmarking the cost of allocation

Related topics