How to benchmark in-place functions?

Hi all,

I wondered how to benchmark functions correctly that modify always the same input in-place. In particular, using the @btime macro of BenchmarkTools.jl. For example consider the following minimal code:

using BenchmarkTools

const array = [1, 2, 3]

function test_btime_inplace!(array)
    push!(array, 4)
end

@btime test_btime_inplace!(array)

Here, I want to benchmark how long it takes to push 4 at the end of array with the input array always staying [1, 2, 3]. However, what @btime actually benchmarks is pushing 4 at the end of [1, 2, 3] in the first iteration, at the end of [1, 2, 3, 4] in the second iteration, at the end of [1, 2, 3, 4, 4] in the third iteration and so on.

I have the feeling I’m missing something obvious here; is there a way to do this?

I think the setup option (in the docs you linked) should do the job?

1 Like

Like this?

out = @btime test_btime_inplace!(array) setup=(array=[1, 2, 3])

However, when doing that out has length(out)=1002 and many 4’s in it. Suggesting to me, that this does not help (?)

You also need to add evals=1.

See:

https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md#setup-and-teardown-phases

Note that the setup and teardown phases are executed for each sample, not each evaluation . Thus, the sorting example above wouldn’t produce the intended results if evals/sample > 1 (it’d suffer from the same problem of benchmarking against an already sorted vector).

I am not sure why the design is the way it is.

3 Likes

Related: BenchmarkTools setup isn't run between each iteration? - #6 by rdeits

ok, thanks guys. I can do

out = @btime test_btime_inplace!(array) setup=(array=[1, 2, 3]) evals=1

and indeed out will be [1, 2, 3, 4], which is good.

However, looking at the actual benchmark result, this is now

julia> 54.000 ns (1 allocation: 64 bytes)

while the old benchmark

out = @btime test_btime_inplace!(array)

gives

julia> 7.492 ns (0 allocations: 0 bytes)

So something is weird here I think?

I suspect you’re seeing an artifact of the amortized constant time of push! (see algorithms - Why is push_back in C++ vectors constant amortized? - Computer Science Stack Exchange for some discussion). The longer the vector gets, the less frequently its capacity is changed, so the more likely it is that your benchmark loop will happen to catch a set of pushes that do not contain a capacity change (and thus don’t contain any memory allocation).

In other words: don’t worry about those few extra nanoseconds and the one allocation here :slight_smile:

1 Like

ok perfect, got it. This was really helpful altogether, thanks!

I just had something similar where I generated setup data using vcat (probably triggering the same push “bug”) like so:

using BenchmarkTools

function something!(a::Vector{Int64})
    sort!(a)
end

function get_data() 
    return vcat(rand(1:10,1900),rand(11:20,100))
end

function test() 
    @btime something!(x) setup=(x=get_data()) evals=1
    @btime something!(x) setup=(x=rand(Int64, 2_000)) evals=1
end

test()

  2.229 ÎĽs (1 allocation: 224 bytes)
  52.214 ÎĽs (0 allocations: 0 bytes)

In case someone else comes across this.