Avoid allocation of small arrays in median, extrema, etc

I have a performance critical call to set bounds on a number. This code works like magic, but I want to know why it doesn’t need to allocate

julia> @benchmark clamp(v[1],extrema((v[2],v[3],v[4]))...) setup=(v=rand(4))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     8.571 ns (0.00% GC)
  median time:      9.320 ns (0.00% GC)
  mean time:        9.318 ns (0.00% GC)
  maximum time:     27.229 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

So again, I love this solution. It is extremely readable. But I need to create a tuple for extrema which seems like it should require an allocation. For example, this is mathematically equivalent

julia> @benchmark median((v[1],v[1],v[2],v[3],v[4])) setup=(v=rand(4))
BenchmarkTools.Trial: 
  memory estimate:  128 bytes
  allocs estimate:  1
  --------------
  minimum time:     94.199 ns (0.00% GC)
  median time:      103.288 ns (0.00% GC)
  mean time:        108.575 ns (1.64% GC)
  maximum time:     1.196 μs (81.55% GC)
  --------------
  samples:          10000
  evals/sample:     957

but requires an allocation and so is 10x slower. I tried median(SArray(v[1],v[1],v[2],v[3],v[4])) and median((v[1],v[2],median((v[1],v[3],v[4])))) but they all allocate.

Is this something special about extrema or something general about Julia I should understand? Or maybe @benchmarktools is lying to me?

extrema iterates over elements and keeps track of the min and max. So it does not allocate. See line 1711 of julia/multidimensional.jl at master · JuliaLang/julia · GitHub

median needs to sort and therefore allocate a vector.

If you can pass v as a Vector, median! does not allocate:

julia> @benchmark median!(v) setup=(v=rand(4))
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     15.952 ns (0.00% GC)
  median time:      16.424 ns (0.00% GC)
  mean time:        17.729 ns (0.00% GC)
  maximum time:     91.392 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997
2 Likes

Thanks, that makes sense. Unfortunately, the bounds aren’t constant, so I would need to refill the whole array every call.

In this case it looks like a custom function wins the race!

@fastmath function median(a,b,c)
    x = a-b
    if x*(b-c) ≥ 0
        return b
    elseif x*(a-c) > 0
        return c
    else
        return a
    end
end
julia> @benchmark median(v[2],v[1],median(v[3],v[1],v[4])) setup=(v=rand(4))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.399 ns (0.00% GC)
  median time:      3.400 ns (0.00% GC)
  mean time:        3.264 ns (0.00% GC)
  maximum time:     26.301 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

Plus this code is especially fast if the value is within the bounds (which is usually the case).

julia> @benchmark median(v[1],m,median(v[2],m,v[3])) setup=(v=rand(3);m=mean(v))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.399 ns (0.00% GC)
  median time:      2.500 ns (0.00% GC)
  mean time:        2.769 ns (0.00% GC)
  maximum time:     75.501 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
2 Likes

I think you can make that function faster and more self-explanatory by replacing the multiplications by comparisons.

if a>b
    b>=c && return b
    a>c && return c
else
    b<=c && return b
    a<c && return c
end
return a

(Disclaimer: I’m on my phone, and I’m not entirely sure what your function is supposed to do so this might be a bit misspelled, but the idea holds)

1 Like

That’s the same speed on my system, but you’re right that it is more readable. Thanks!

If you rewrite it

function fun(a, b, c)
    (a>b)*(b>=c) && return b
    (a>b)*(a>c) && return c
    return a
end

it only has to perform boolean mutliplications;
medians

“newcompmedian” is the new one. I’m guessing the three modes there correspond to shortcutting after 1, 2 or 3 comparisons.

1 Like

Cool! Can you post the code for this? I would love to take a look.

Did it in the REPL (stupidly), but

using BenchmarkTools, StatsPlots, BenchmarkPlots # Requires this pull request: https://github.com/JuliaCI/BenchmarkTools.jl/pull/194/commits/230663872002eae1f9513590f68322636f2fa05d
default(fontfamily="Computer Modern")

function mulmedian(a, b, c)
   x = a-b
   if x*(b-c) >= 0
       return b
   elseif x*(a-c) > 0
       return c
   else
       return a
   end
end

function compmedian(a, b, c)
    if a>b
        b>=c && return b
        a>c && return c
    else
        b<=c && return b
        a<c && return c
    end
    return a
end
function newcompmedian(a, b, c)
    xor((a>b), (b<=c)) && return b
    xor((a>b), (a<c)) && return c
    return a
end

clampfun(v) = clamp(v[1], extrema((v[2],v[3],v[4]))...)
medianfun(v) = median((v[1],v[1],v[2],v[3],v[4]))
mulmedianfun(v) = mulmedian(v[2], v[1], mulmedian(v[3], v[1], v[4]))
compmedianfun(v) = compmedian(v[2], v[1], compmedian(v[3], v[1], v[4]))
newcompmedianfun(v) = newcompmedian(v[2], v[1], newcompmedian(v[3], v[1], v[4]))

suite = BenchmarkGroup()
suite["clamp"] = @benchmarkable clampfun(v) setup=(v=rand(4))
suite["median"] = @benchmarkable medianfun(v) setup=(v=rand(4))
suite["mulmedian"] = @benchmarkable mulmedianfun(v) setup=(v=rand(4))
suite["compmedian"] = @benchmarkable compmedianfun(v) setup=(v=rand(4))
suite["newcompmedian"] = @benchmarkable newcompmedianfun(v) setup=(v=rand(4))

results = run(suite)
plot(results; yscale=:log10)
2 Likes

Of note here, though, (a>b)*(b >= c) is not equal to (a-b)*(b-c) >= 0

julia> a, b, c = 1, 2, 3
(1, 2, 3)

julia> (a > b) *(b >= c)
false

julia> (a - b)*(b - c) >=0
true

Yeah, you’re right. I meant xor. Which actually makes it a bit faster. Edited.

I do not quite understand, why xor is better

julia> xor((a>b), (b>=c))
false

I think proper formula is

julia> (a == b) | (b == c) | ((a > b)&(b>c)) | ((a < b)&(b < c))
true

but it is rather slow.

If (a-b)*(b-c) >= 0, then both parentheses are >=0, or they are both <=0. Depending on how sensitive the application is to the difference between <= and < (I still don’t quite understand what we are doing here, but it’s some kind of sorting/clamping thing so it shouldn’t matter), you can use xor((a>=b), (b<c)). Or some other combination of comparisons. Shouldn’t matter much for the timings.

We’re ensuring that a number is within the extrema of three other numbers. The clamp function version is certainly the most direct and since it doesn’t use any custom code, it could be used to unit test any of the other versions.

I’m happy to do so but (dumb question) how do I get your PR above? Do I need to git fetch https://github.com/JuliaCI/BenchmarkTools.jl/pull/194/commits/230663872002eae1f9513590f68322636f2fa05d

I’m not sure if there’s a native way to select a specific commit, but ]dev https://github.com/gustaphe/BenchmarkTools.jl/ should do the job for now because it’s on my master branch.

1 Like

Perhaps a bit of a tangent, but here’s an attempt at at a non-allocating median for tuples, by simply listing all the options. This will end up making the same comparison many times, and only works for short tuples, surely there is a better algorithm:

using Combinatorics, Statistics
function _medex(n)
    out = quote end
    for p in permutations(1:n)
        z = zip(Iterators.repeated(:(<=)), [:(x[$i]) for i in p])
        c = Expr(:comparison, Iterators.drop(Iterators.flatten(z),1)...)
        v = isodd(n) ? :(x[$(p[1+n÷2])]) : :((x[$(p[n÷2])] + x[$(p[1+n÷2])])/2)
        push!(out.args, :($c && return $v))
    end
    out
end
_medex(3) # to see what this generates
@generated unrollmedian(x::Tuple) = _medex(length(x.parameters))
unrollmedian(xs::Number...) = unrollmedian(xs)
for n in 2:6
    x = Tuple(rand(Int8,n))
    @show n
    @btime median($(Ref(x))[])
    @btime unrollmedian($(Ref(x))[])  # at n=7 I have to kill julia 
end

using Random, Statistics, BenchmarkTools
Statistics.median(xs::Number...) = median(xs) # to match @gustaphe's functions
Random.seed!(1);
xs = Tuple(rand(Int8, 3)) # (64, 82, -103)
for f in [median, unrollmedian, fastmath_median, fun, mulmedian, compmedian, newcompmedian]    
    @show f f(xs...)
    @btime $f($(Ref(xs))[]...)
end

This comparison prints the following, note that several functions above give the wrong answer, I think they assume that all numbers are positive. Not sure how meaningful these tiny times are:

f = Statistics.median
f(xs...) = -82.0
  26.439 ns (1 allocation: 96 bytes)
f = unrollmedian
f(xs...) = -82
  1.208 ns (0 allocations: 0 bytes)
f = fastmath_median
f(xs...) = 103
  0.916 ns (0 allocations: 0 bytes)
f = fun
f(xs...) = -82
  1.292 ns (0 allocations: 0 bytes)
f = mulmedian
f(xs...) = 103
  0.916 ns (0 allocations: 0 bytes)
f = compmedian
f(xs...) = -82
  1.166 ns (0 allocations: 0 bytes)
f = newcompmedian
f(xs...) = -82
  1.333 ns (0 allocations: 0 bytes)

I can confirm that newcompmedian doesn’t work. If a<b<c then xor( a>b, b>=c ) = xor(false,false)=false which means it misses b as the median.

This one seems to work

function eqcompmedian(a, b, c)
    (a>b) == (b>=c) && return b
    (a>b) == (a>c) && return c
    return a
end

I think this is exactly the same logical chain as mulmedian. When doing the timing, I get

julia> run(suite)
5-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "median" => Trial(144.000 ns)
  "compmedian" => Trial(43.000 ns)
  "eqcompmedian" => Trial(45.000 ns)
  "clamp" => Trial(63.000 ns) #  this one seems quite variable??
  "mulmedian" => Trial(47.000 ns)

So compmedian is the best by a hair.

Yeah, it would have to be opposite comparisons. (a>b) == (b>=c) <==> ~xor((a>b), (b<c)). Don’t know if there’s any performance implication.

That works! On my machine all of the double median version are essentially the same. Clamp is a little slower (actually, still pretty much the same) and the single vectorized median is much worse due to the allocation.

@mcabbot, in case it might interest, for larger lists of numbers, a median finding algorithm, seemingly faster than sort-based methods, is presented in a simple way here and the author points also to a more advanced reference on this topic by Andrei Alexandrescu.

1 Like

There is also approximate median algorithm, which is promised to be effective: https://link.springer.com/chapter/10.1007%2F3-540-46521-9_19

2 Likes