Avoid allocation of small arrays in median, extrema, etc

weymouth · March 27, 2021, 12:22pm

I have a performance critical call to set bounds on a number. This code works like magic, but I want to know why it doesn’t need to allocate

julia> @benchmark clamp(v[1],extrema((v[2],v[3],v[4]))...) setup=(v=rand(4))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     8.571 ns (0.00% GC)
  median time:      9.320 ns (0.00% GC)
  mean time:        9.318 ns (0.00% GC)
  maximum time:     27.229 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

So again, I love this solution. It is extremely readable. But I need to create a tuple for extrema which seems like it should require an allocation. For example, this is mathematically equivalent

julia> @benchmark median((v[1],v[1],v[2],v[3],v[4])) setup=(v=rand(4))
BenchmarkTools.Trial: 
  memory estimate:  128 bytes
  allocs estimate:  1
  --------------
  minimum time:     94.199 ns (0.00% GC)
  median time:      103.288 ns (0.00% GC)
  mean time:        108.575 ns (1.64% GC)
  maximum time:     1.196 μs (81.55% GC)
  --------------
  samples:          10000
  evals/sample:     957

but requires an allocation and so is 10x slower. I tried median(SArray(v[1],v[1],v[2],v[3],v[4])) and median((v[1],v[2],median((v[1],v[3],v[4])))) but they all allocate.

Is this something special about extrema or something general about Julia I should understand? Or maybe @benchmarktools is lying to me?

hendri54 · March 27, 2021, 12:59pm

extrema iterates over elements and keeps track of the min and max. So it does not allocate. See line 1711 of https://github.com/JuliaLang/julia/blob/master/base/multidimensional.jl

median needs to sort and therefore allocate a vector.

If you can pass v as a Vector, median! does not allocate:

julia> @benchmark median!(v) setup=(v=rand(4))
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     15.952 ns (0.00% GC)
  median time:      16.424 ns (0.00% GC)
  mean time:        17.729 ns (0.00% GC)
  maximum time:     91.392 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

weymouth · March 27, 2021, 2:25pm

Thanks, that makes sense. Unfortunately, the bounds aren’t constant, so I would need to refill the whole array every call.

In this case it looks like a custom function wins the race!

@fastmath function median(a,b,c)
    x = a-b
    if x*(b-c) ≥ 0
        return b
    elseif x*(a-c) > 0
        return c
    else
        return a
    end
end
julia> @benchmark median(v[2],v[1],median(v[3],v[1],v[4])) setup=(v=rand(4))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.399 ns (0.00% GC)
  median time:      3.400 ns (0.00% GC)
  mean time:        3.264 ns (0.00% GC)
  maximum time:     26.301 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

Plus this code is especially fast if the value is within the bounds (which is usually the case).

julia> @benchmark median(v[1],m,median(v[2],m,v[3])) setup=(v=rand(3);m=mean(v))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.399 ns (0.00% GC)
  median time:      2.500 ns (0.00% GC)
  mean time:        2.769 ns (0.00% GC)
  maximum time:     75.501 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

gustaphe · March 27, 2021, 3:16pm

I think you can make that function faster and more self-explanatory by replacing the multiplications by comparisons.

if a>b
    b>=c && return b
    a>c && return c
else
    b<=c && return b
    a<c && return c
end
return a

(Disclaimer: I’m on my phone, and I’m not entirely sure what your function is supposed to do so this might be a bit misspelled, but the idea holds)

weymouth · March 27, 2021, 4:34pm

That’s the same speed on my system, but you’re right that it is more readable. Thanks!

gustaphe · March 28, 2021, 10:40am

If you rewrite it

function fun(a, b, c)
    (a>b)*(b>=c) && return b
    (a>b)*(a>c) && return c
    return a
end

it only has to perform boolean mutliplications;
medians

“newcompmedian” is the new one. I’m guessing the three modes there correspond to shortcutting after 1, 2 or 3 comparisons.

weymouth · March 28, 2021, 10:54am

Cool! Can you post the code for this? I would love to take a look.

gustaphe · March 28, 2021, 11:09am

Did it in the REPL (stupidly), but

using BenchmarkTools, StatsPlots, BenchmarkPlots # Requires this pull request: https://github.com/JuliaCI/BenchmarkTools.jl/pull/194/commits/230663872002eae1f9513590f68322636f2fa05d
default(fontfamily="Computer Modern")

function mulmedian(a, b, c)
   x = a-b
   if x*(b-c) >= 0
       return b
   elseif x*(a-c) > 0
       return c
   else
       return a
   end
end

function compmedian(a, b, c)
    if a>b
        b>=c && return b
        a>c && return c
    else
        b<=c && return b
        a<c && return c
    end
    return a
end
function newcompmedian(a, b, c)
    xor((a>b), (b<=c)) && return b
    xor((a>b), (a<c)) && return c
    return a
end

clampfun(v) = clamp(v[1], extrema((v[2],v[3],v[4]))...)
medianfun(v) = median((v[1],v[1],v[2],v[3],v[4]))
mulmedianfun(v) = mulmedian(v[2], v[1], mulmedian(v[3], v[1], v[4]))
compmedianfun(v) = compmedian(v[2], v[1], compmedian(v[3], v[1], v[4]))
newcompmedianfun(v) = newcompmedian(v[2], v[1], newcompmedian(v[3], v[1], v[4]))

suite = BenchmarkGroup()
suite["clamp"] = @benchmarkable clampfun(v) setup=(v=rand(4))
suite["median"] = @benchmarkable medianfun(v) setup=(v=rand(4))
suite["mulmedian"] = @benchmarkable mulmedianfun(v) setup=(v=rand(4))
suite["compmedian"] = @benchmarkable compmedianfun(v) setup=(v=rand(4))
suite["newcompmedian"] = @benchmarkable newcompmedianfun(v) setup=(v=rand(4))

results = run(suite)
plot(results; yscale=:log10)

Skoffer · March 28, 2021, 11:17am

Of note here, though, (a>b)*(b >= c) is not equal to (a-b)*(b-c) >= 0

julia> a, b, c = 1, 2, 3
(1, 2, 3)

julia> (a > b) *(b >= c)
false

julia> (a - b)*(b - c) >=0
true

gustaphe · March 28, 2021, 11:44am

Yeah, you’re right. I meant xor. Which actually makes it a bit faster. Edited.

Skoffer · March 28, 2021, 11:48am

I do not quite understand, why xor is better

julia> xor((a>b), (b>=c))
false

I think proper formula is

julia> (a == b) | (b == c) | ((a > b)&(b>c)) | ((a < b)&(b < c))
true

but it is rather slow.

gustaphe · March 28, 2021, 12:20pm

If (a-b)*(b-c) >= 0, then both parentheses are >=0, or they are both <=0. Depending on how sensitive the application is to the difference between <= and < (I still don’t quite understand what we are doing here, but it’s some kind of sorting/clamping thing so it shouldn’t matter), you can use xor((a>=b), (b<c)). Or some other combination of comparisons. Shouldn’t matter much for the timings.

weymouth · March 28, 2021, 12:41pm

We’re ensuring that a number is within the extrema of three other numbers. The clamp function version is certainly the most direct and since it doesn’t use any custom code, it could be used to unit test any of the other versions.

I’m happy to do so but (dumb question) how do I get your PR above? Do I need to git fetch https://github.com/JuliaCI/BenchmarkTools.jl/pull/194/commits/230663872002eae1f9513590f68322636f2fa05d

gustaphe · March 28, 2021, 12:48pm

I’m not sure if there’s a native way to select a specific commit, but ]dev https://github.com/gustaphe/BenchmarkTools.jl/ should do the job for now because it’s on my master branch.

mcabbott · March 28, 2021, 1:13pm

Perhaps a bit of a tangent, but here’s an attempt at at a non-allocating median for tuples, by simply listing all the options. This will end up making the same comparison many times, and only works for short tuples, surely there is a better algorithm:

using Combinatorics, Statistics
function _medex(n)
    out = quote end
    for p in permutations(1:n)
        z = zip(Iterators.repeated(:(<=)), [:(x[$i]) for i in p])
        c = Expr(:comparison, Iterators.drop(Iterators.flatten(z),1)...)
        v = isodd(n) ? :(x[$(p[1+n÷2])]) : :((x[$(p[n÷2])] + x[$(p[1+n÷2])])/2)
        push!(out.args, :($c && return $v))
    end
    out
end
_medex(3) # to see what this generates
@generated unrollmedian(x::Tuple) = _medex(length(x.parameters))
unrollmedian(xs::Number...) = unrollmedian(xs)
for n in 2:6
    x = Tuple(rand(Int8,n))
    @show n
    @btime median($(Ref(x))[])
    @btime unrollmedian($(Ref(x))[])  # at n=7 I have to kill julia 
end

using Random, Statistics, BenchmarkTools
Statistics.median(xs::Number...) = median(xs) # to match @gustaphe's functions
Random.seed!(1);
xs = Tuple(rand(Int8, 3)) # (64, 82, -103)
for f in [median, unrollmedian, fastmath_median, fun, mulmedian, compmedian, newcompmedian]    
    @show f f(xs...)
    @btime $f($(Ref(xs))[]...)
end

This comparison prints the following, note that several functions above give the wrong answer, I think they assume that all numbers are positive. Not sure how meaningful these tiny times are:

f = Statistics.median
f(xs...) = -82.0
  26.439 ns (1 allocation: 96 bytes)
f = unrollmedian
f(xs...) = -82
  1.208 ns (0 allocations: 0 bytes)
f = fastmath_median
f(xs...) = 103
  0.916 ns (0 allocations: 0 bytes)
f = fun
f(xs...) = -82
  1.292 ns (0 allocations: 0 bytes)
f = mulmedian
f(xs...) = 103
  0.916 ns (0 allocations: 0 bytes)
f = compmedian
f(xs...) = -82
  1.166 ns (0 allocations: 0 bytes)
f = newcompmedian
f(xs...) = -82
  1.333 ns (0 allocations: 0 bytes)

weymouth · March 28, 2021, 3:31pm

I can confirm that newcompmedian doesn’t work. If a<b<c then xor( a>b, b>=c ) = xor(false,false)=false which means it misses b as the median.

This one seems to work

function eqcompmedian(a, b, c)
    (a>b) == (b>=c) && return b
    (a>b) == (a>c) && return c
    return a
end

I think this is exactly the same logical chain as mulmedian. When doing the timing, I get

julia> run(suite)
5-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "median" => Trial(144.000 ns)
  "compmedian" => Trial(43.000 ns)
  "eqcompmedian" => Trial(45.000 ns)
  "clamp" => Trial(63.000 ns) #  this one seems quite variable??
  "mulmedian" => Trial(47.000 ns)

So compmedian is the best by a hair.

gustaphe · March 28, 2021, 3:49pm

Yeah, it would have to be opposite comparisons. (a>b) == (b>=c) <==> ~xor((a>b), (b<c)). Don’t know if there’s any performance implication.

weymouth · March 28, 2021, 3:56pm

That works! On my machine all of the double median version are essentially the same. Clamp is a little slower (actually, still pretty much the same) and the single vectorized median is much worse due to the allocation.

rafael.guerra · March 28, 2021, 4:38pm

@mcabbot, in case it might interest, for larger lists of numbers, a median finding algorithm, seemingly faster than sort-based methods, is presented in a simple way here and the author points also to a more advanced reference on this topic by Andrei Alexandrescu.

Skoffer · March 28, 2021, 4:44pm

There is also approximate median algorithm, which is promised to be effective: An Efficient Algorithm for the Approximate Median Selection Problem | SpringerLink

Topic		Replies	Views
Branching in loop decrease perf by ~x5 Performance	6	779	December 13, 2020
Replicate @tturbo performance Performance	23	2430	August 23, 2022
How to speed up `Statistics.median` along a given dimension? Performance question , python	9	1512	January 11, 2020
Median calculation performing better than expected Performance algorithm	3	432	January 18, 2023
Performance of creating a tuple with a for loop Performance	26	2630	September 15, 2020

Avoid allocation of small arrays in median, extrema, etc

Related topics