 # How to benchmark in-place functions?

Hi all,

I wondered how to benchmark functions correctly that modify always the same input in-place. In particular, using the `@btime` macro of BenchmarkTools.jl. For example consider the following minimal code:

``````using BenchmarkTools

const array = [1, 2, 3]

function test_btime_inplace!(array)
push!(array, 4)
end

@btime test_btime_inplace!(array)
``````

Here, I want to benchmark how long it takes to push 4 at the end of `array` with the input array always staying `[1, 2, 3]`. However, what @btime actually benchmarks is pushing 4 at the end of `[1, 2, 3]` in the first iteration, at the end of `[1, 2, 3, 4]` in the second iteration, at the end of `[1, 2, 3, 4, 4]` in the third iteration and so on.

I have the feeling I’m missing something obvious here; is there a way to do this?

I think the `setup` option (in the docs you linked) should do the job?

1 Like

Like this?

``````out = @btime test_btime_inplace!(array) setup=(array=[1, 2, 3])
``````

However, when doing that `out` has `length(out)=1002` and many 4’s in it. Suggesting to me, that this does not help (?)

You also need to add `evals=1`.

See:

Note that the `setup` and `teardown` phases are executed for each sample, not each evaluation . Thus, the sorting example above wouldn’t produce the intended results if `evals/sample > 1` (it’d suffer from the same problem of benchmarking against an already sorted vector).

I am not sure why the design is the way it is.

2 Likes

ok, thanks guys. I can do

``````out = @btime test_btime_inplace!(array) setup=(array=[1, 2, 3]) evals=1
``````

and indeed out will be `[1, 2, 3, 4]`, which is good.

However, looking at the actual benchmark result, this is now

``````julia> 54.000 ns (1 allocation: 64 bytes)
``````

while the old benchmark

``````out = @btime test_btime_inplace!(array)
``````

gives

``````julia> 7.492 ns (0 allocations: 0 bytes)
``````

So something is weird here I think?

I suspect you’re seeing an artifact of the amortized constant time of `push!` (see https://cs.stackexchange.com/a/9382 for some discussion). The longer the vector gets, the less frequently its capacity is changed, so the more likely it is that your benchmark loop will happen to catch a set of pushes that do not contain a capacity change (and thus don’t contain any memory allocation).

In other words: don’t worry about those few extra nanoseconds and the one allocation here 1 Like

ok perfect, got it. This was really helpful altogether, thanks!