I would like to have an educated guess of how much speed I would gain if I rewrote a rather large codebase with preallocated buffers (all the other low-hanging fruits I could think of have been harvested at this point, eg checking type stability, profiling, fixing hot loops, static arrays, etc).
Essentially, I would like to measure just the cost of allocations, separately from the rest of the computation. I can of course count allocations with BenchmarkTools
and @time
, but I don’t know how to translate that into performance.
Conceptually, I would like to do something like the comparison
using BenchmarkTools
f(a, b) = map((a, b) -> √abs(a) + exp(b), a, b)
f!(result, a, b) = map!((a, b) -> √abs(a) + exp(b), result, a, b)
a = randn(1000)
b = randn(1000)
result = similar(a)
@btime f($a, $b);
@btime f!($result, $a, $b);
without writing f!
.
Any hints would be appreciated.
1 Like
Time spent in GC is some sort of measurement (reported by @benchmark
).
Accurate prediction seems extremely hard since even if you measure the time spent allocating and running GC it is hard to predict the other effects this has on the system (like how it influences cache etc).
2 Likes
Yeah, I often find the speedup quite a lot greater in practice than what I would have guessed based on the time previously spent on gc.
1 Like
Yes, in the example above GC time is 0 for me (for f
).
Expanding on this: I get
julia> @benchmark f($a, $b)
BenchmarkTools.Trial:
memory estimate: 8.00 KiB
allocs estimate: 4
--------------
minimum time: 12.665 μs (0.00% GC)
median time: 15.040 μs (0.00% GC)
mean time: 15.395 μs (0.00% GC)
maximum time: 186.778 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
I wonder why GC time is consistently zero. Should this be reported as an issue for BenchmarkTools or Julia?
BenchmarkTools runs GC itself between samples so that might take care of all the GC needed.
You could try set
BenchmarkTools.DEFAULT_PARAMETERS.gctrial = false
and see if anything changes.
Thanks, but that didn’t change anything. Neither did
BenchmarkTools.DEFAULT_PARAMETERS.samples = 1_000_000
My I understanding is that allocating dynamically is costly because
- of allocation (creating a new array etc)
- GC time
The way I understand this discussion is that currently it is conceptually difficult to measure (1), while for (2) we have facilities but I am not able to trigger them in practice.
I get
julia> @benchmark f($a, $b)
BenchmarkTools.Trial:
memory estimate: 8.00 KiB
allocs estimate: 4
--------------
minimum time: 8.092 μs (0.00% GC)
median time: 8.289 μs (0.00% GC)
mean time: 8.581 μs (1.87% GC)
maximum time: 9.631 ms (99.85% GC)
--------------
samples: 188641
evals/sample: 3
so for me, some GC time is shown.
Your maximum time is quite low so it just doesn’t seem that any GC was triggered when measuring.
1 Like