How to benchmark in-place functions?

Hi all,

I wondered how to benchmark functions correctly that modify always the same input in-place. In particular, using the @btime macro of BenchmarkTools.jl. For example consider the following minimal code:

using BenchmarkTools

const array = [1, 2, 3]

function test_btime_inplace!(array)
    push!(array, 4)

@btime test_btime_inplace!(array)

Here, I want to benchmark how long it takes to push 4 at the end of array with the input array always staying [1, 2, 3]. However, what @btime actually benchmarks is pushing 4 at the end of [1, 2, 3] in the first iteration, at the end of [1, 2, 3, 4] in the second iteration, at the end of [1, 2, 3, 4, 4] in the third iteration and so on.

I have the feeling I’m missing something obvious here; is there a way to do this?

I think the setup option (in the docs you linked) should do the job?

1 Like

Like this?

out = @btime test_btime_inplace!(array) setup=(array=[1, 2, 3])

However, when doing that out has length(out)=1002 and many 4’s in it. Suggesting to me, that this does not help (?)

You also need to add evals=1.


Note that the setup and teardown phases are executed for each sample, not each evaluation . Thus, the sorting example above wouldn’t produce the intended results if evals/sample > 1 (it’d suffer from the same problem of benchmarking against an already sorted vector).

I am not sure why the design is the way it is.


Related: BenchmarkTools setup isn't run between each iteration?

ok, thanks guys. I can do

out = @btime test_btime_inplace!(array) setup=(array=[1, 2, 3]) evals=1

and indeed out will be [1, 2, 3, 4], which is good.

However, looking at the actual benchmark result, this is now

julia> 54.000 ns (1 allocation: 64 bytes)

while the old benchmark

out = @btime test_btime_inplace!(array)


julia> 7.492 ns (0 allocations: 0 bytes)

So something is weird here I think?

I suspect you’re seeing an artifact of the amortized constant time of push! (see for some discussion). The longer the vector gets, the less frequently its capacity is changed, so the more likely it is that your benchmark loop will happen to catch a set of pushes that do not contain a capacity change (and thus don’t contain any memory allocation).

In other words: don’t worry about those few extra nanoseconds and the one allocation here :slight_smile:

1 Like

ok perfect, got it. This was really helpful altogether, thanks!