How to benchmark in-place functions?

Hi all,

I wondered how to benchmark functions correctly that modify always the same input in-place. In particular, using the @btime macro of BenchmarkTools.jl. For example consider the following minimal code:

using BenchmarkTools

const array = [1, 2, 3]

function test_btime_inplace!(array)
    push!(array, 4)
end

@btime test_btime_inplace!(array)

Here, I want to benchmark how long it takes to push 4 at the end of array with the input array always staying [1, 2, 3]. However, what @btime actually benchmarks is pushing 4 at the end of [1, 2, 3] in the first iteration, at the end of [1, 2, 3, 4] in the second iteration, at the end of [1, 2, 3, 4, 4] in the third iteration and so on.

I have the feeling I’m missing something obvious here; is there a way to do this?

I think the setup option (in the docs you linked) should do the job?

1 Like

Like this?

out = @btime test_btime_inplace!(array) setup=(array=[1, 2, 3])

However, when doing that out has length(out)=1002 and many 4’s in it. Suggesting to me, that this does not help (?)

You also need to add evals=1.

See:

Note that the setup and teardown phases are executed for each sample, not each evaluation . Thus, the sorting example above wouldn’t produce the intended results if evals/sample > 1 (it’d suffer from the same problem of benchmarking against an already sorted vector).

I am not sure why the design is the way it is.

2 Likes

Related: BenchmarkTools setup isn't run between each iteration?

ok, thanks guys. I can do

out = @btime test_btime_inplace!(array) setup=(array=[1, 2, 3]) evals=1

and indeed out will be [1, 2, 3, 4], which is good.

However, looking at the actual benchmark result, this is now

julia> 54.000 ns (1 allocation: 64 bytes)

while the old benchmark

out = @btime test_btime_inplace!(array)

gives

julia> 7.492 ns (0 allocations: 0 bytes)

So something is weird here I think?

I suspect you’re seeing an artifact of the amortized constant time of push! (see https://cs.stackexchange.com/a/9382 for some discussion). The longer the vector gets, the less frequently its capacity is changed, so the more likely it is that your benchmark loop will happen to catch a set of pushes that do not contain a capacity change (and thus don’t contain any memory allocation).

In other words: don’t worry about those few extra nanoseconds and the one allocation here :slight_smile:

1 Like

ok perfect, got it. This was really helpful altogether, thanks!