Median calculation performing better than expected

Nino_Rode · January 18, 2023, 5:20pm

I have an “it works … why?” question. I hope this is the right place.

I’m trying to learn to write julia code to have as good performance as possible (yes, I’m new with julia). Most of the time, I’m trying to reproduce the performance of the julia libraries (not always successfully, of course). However, in my experiments I managed to find an algorithm for finding median, which outperforms the built-in median function. According to benchmarks, my function outperforms the built-in one 1:10.

Here is the code

using BenchmarkTools

function medianof4(a::Number, b::Number, c::Number, d::Number)

    if a > c
        a, c = c, a
    end
    if b > d
        b, d = d, b
    end
    if a > b
        if c > d
            return a, d
        else
            return a, c
        end
    else
        if c > d
            return b, d
        else
            return b, c
        end
    end
end

function medianof3(a::Number, b::Number, c::Number)

    if a > c
        a, c = c, a
    end
    if a > b
        if b > c
            return b
        elseif a > c
            return c
        else
            return a
        end
    else
        if b < c
            return b
        elseif a > c
            return a
        else
            return c
        end
    end
end

"""
    linmedian(data)

Linear algorithm for finding median of a vector or some part of it.
"""
function linmedian(data; l=1, h=length(data), mode="middle")

    ml, odd = divrem(h, 2)
    mh = ml + 1
    tmpl, tmph = data[ml], data[mh]

    if odd == 0
        while l < ml
            tmpl, tmph = medianof4(data[l], tmpl, tmph, data[h])
        l += 1
        h -= 1
        end

        if mode == "higher" || mode == "high" || mode == "hi" || mode == "h"
            return tmph
        elseif mode == "lower" || mode == "low" || mode == "lo" || mode == "l"
            return tmpl
        else
            return (tmpl + tmph) / 2
        end
    else
        while l < ml
            tmph = medianof3(data[l], tmph, data[h])
            l += 1
            h -= 1
        end 

        return tmph
    end
end

d = rand(0:99, 12000)

@benchmark linmedian($d)

@benchmark median($d)

OK, I did optimize everything I knew how, and I’m not testing for consistency of input and all that, but still, I’m surprised. Am I doing something wrong (I double-checked the results, output is correct)? Did I overlook something? Did I, per chance, find a faster algorithm for median?

I would really like to hear your opinion.

rafael.guerra · January 18, 2023, 5:29pm

I am getting quite different results:

d = rand(0:99, 12000)

julia> linmedian(d)
32.0

julia> median(d)
49.0

Nino_Rode · January 18, 2023, 5:37pm

OOPS!

That changes things. I’ll look into it …

Nino_Rode · January 18, 2023, 6:19pm

Yes, the algorithm is flawed.
No wonder it is quick (I found the problem, so back to the drawing board …)

Sorry for your time.

Topic		Replies	Views
Avoid allocation of small arrays in median, extrema, etc Performance memory-allocation	23	1076	March 28, 2021
How to speed up `Statistics.median` along a given dimension? Performance question , python	9	1512	January 11, 2020
Median doesn't behave like mean on a vector of vectors General Usage	3	283	December 6, 2023
C implementation of function being ~4 times faster even absence of allocs Performance ccall	28	1398	March 5, 2025
How to make this custom median for color images faster Visualization question , images , juliaimages	2	360	October 6, 2023

Median calculation performing better than expected

Related topics