# BenchmarkTools: How to use different random array on each run (esp. for mutating functions)?

I am implementing a routine to find ‘a’ `median` of an integer array. The algorithm is based on quicksort, and both implementations below partially search/sort about half the array. They both have the same O(n) average complexity but why is the second one so much slower than the first (despite taking the same number of iterations on an average) ? Should I understand that swapping elements in an array is slower than creating new arrays? If so, why? Especially, note that the slower one uses zero memory!

``````
function quickmedian1(a, m=(length(a)+1)÷2, indices=(1:length(a)), left=0)
pivot = a[indices[1]]
indices₋ = [j for j in indices if a[j] < pivot]
indices₌ = [j for j in indices if a[j] == pivot]

if left + length(indices₋) + length(indices₌) < m
indices₊ = [j for j in indices if a[j] > pivot]
return quickmedian1(a, m, indices₊, left+length(indices₋)+length(indices₌))
elseif left + length(indices₋) < m
return pivot
else
return quickmedian1(a, m, indices₋, left)
end
end

function quickmedian2!(a::AbstractArray{I}, m::Int=(length(a)+1)÷2, x::Int=1, y::Int=length(a)) where I
i = x-1
for j in x:y
if (a[j] < a[y]) || (j==y)
i = i+1
a[j], a[i] = a[i], a[j]
end
end

if i > m
return quickmedian2!(a, m, x, i-1)
elseif i < m
return quickmedian2!(a, m, i+1, y)
else
return a[i]
end
end

using Statistics
using BenchmarkTools

K, L = 8, 6

a = @benchmark median(\$(rand(0:10^K, 10^L)))
println("\n#################\nmedian")
display(a)
a = @benchmark quickmedian1(\$(rand(0:10^K, 10^L)))
println("\n#################\nquickmedian1")
display(a)
a = @benchmark quickmedian2!(\$(rand(0:10^K, 10^L)))
println("\n#################\nquickmedian2!")
display(a)

``````

Output:

``````#################
median
BenchmarkTools.Trial:
memory estimate:  7.63 MiB
allocs estimate:  2
--------------
minimum time:     10.501 ms (0.00% GC)
median time:      10.915 ms (0.00% GC)
mean time:        11.576 ms (0.87% GC)
maximum time:     16.586 ms (1.16% GC)
--------------
samples:          431
evals/sample:     1
#################
quickmedian1
BenchmarkTools.Trial:
memory estimate:  44.78 MiB
allocs estimate:  584
--------------
minimum time:     53.406 ms (0.34% GC)
median time:      62.003 ms (0.36% GC)
mean time:        63.645 ms (3.40% GC)
maximum time:     119.835 ms (49.61% GC)
--------------
samples:          79
evals/sample:     1
#################
quickmedian2!
BenchmarkTools.Trial:
memory estimate:  0 bytes
allocs estimate:  0
--------------
minimum time:     13.134 s (0.00% GC)
median time:      13.134 s (0.00% GC)
mean time:        13.134 s (0.00% GC)
maximum time:     13.134 s (0.00% GC)
--------------
samples:          1
evals/sample:     1
``````

PS: Is there an easy way to show benchmark outputs? Also when benchmark runs multiple times on the mutating function call it is not using new input each time, how to remedy that?

With this benchmark setup

``````K, L = 8, 6

A = rand(0:10^K, 10^L)
@btime median(copy(\$A))
@btime quickmedian1(copy(\$A))
@btime quickmedian2!(copy(\$A))
``````

I see

``````  9.220 ms (4 allocations: 15.26 MiB)
57.990 ms (348 allocations: 65.40 MiB)
10.030 ms (2 allocations: 7.63 MiB)
``````

Edit: but this could be a benchmark mistake on my side. Are you sure you are implementing the correct algorithm? I’m observing the `m` isn’t changing in both implementations…

1 Like

Thank you. That’s a relief! So I am benchmarking wrong.
Also firstly, if I call

``````quickmedian2!(A)
quickmedian2!(A)
``````

The second run will take “a lot of” time. i.e. my algorithm works very bad if the array is already sorted.

In view of that,

1. Why is my benchmarking going haywire?
2. What is the correct way to do it?
3. How can I eliminate from the final output, the memory and time required for the `copy`?
4. What happens with `median(\$(copy(A)))`?
5. How can I get benchmarking to use different random sample in each run?
6. How can I see a nice histogram plot of times as shown in the README for `BenchmarkTools` ?

Or in summary, why the discrepancy in `@btime`s below? what is it reporting? What does the `\$` do exactly?

``````@btime quickmedian2!(\$(rand(0:10^K, 10^L)))
11.330 s (0 allocations: 0 bytes)
@btime quickmedian2!(rand(0:10^K, 10^L))
15.943 ms (6 allocations: 7.63 MiB)
``````

Thank you.

Oh, I can do that one.

``````a = @benchmark quickmedian2!(\$(rand(0:10^K, 10^L)))
display(a)
``````

Just do

`@benchmark quickmedian2!(\$(rand(0:10^K, 10^L)))`

`@benchmark` runs the code multiple times to get statistics, so if your code mutates the inputs, you must make sure the benchmark doesn’t operate on mutated data.

There are a couple of alternatives:

``````@benchmark foo!(A) setup=(A=rand(rand(0:10^K, 10^L))) evals=1
``````

or

``````B = rand(rand(0:10^K, 10^L))
@benchmark foo!(A) setup=(A=copy(B)) evals=1
``````

`evals=1` ensures that only one evaluation happens on each array instance (i.e. one eval for each `setup`), just make sure it is big enough to measure accurately.

The difference between the two examples is that the first creates a new random matrix for each iteration, while the second makes a new copy of the same matrix each time.

2 Likes

the easiest fix is to pick a random element as your pivot. that will make it impossible to consistently pick bad pivots.

What is the difference between samples and evaluations? I can I make sure there is only one eval/sample?