Median calculation performing better than expected

I have an “it works … why?” question. I hope this is the right place.

I’m trying to learn to write julia code to have as good performance as possible (yes, I’m new with julia). Most of the time, I’m trying to reproduce the performance of the julia libraries (not always successfully, of course). However, in my experiments I managed to find an algorithm for finding median, which outperforms the built-in median function. According to benchmarks, my function outperforms the built-in one 1:10.

Here is the code

using BenchmarkTools

function medianof4(a::Number, b::Number, c::Number, d::Number)

    if a > c
        a, c = c, a
    end
    if b > d
        b, d = d, b
    end
    if a > b
        if c > d
            return a, d
        else
            return a, c
        end
    else
        if c > d
            return b, d
        else
            return b, c
        end
    end
end

function medianof3(a::Number, b::Number, c::Number)

    if a > c
        a, c = c, a
    end
    if a > b
        if b > c
            return b
        elseif a > c
            return c
        else
            return a
        end
    else
        if b < c
            return b
        elseif a > c
            return a
        else
            return c
        end
    end
end

"""
    linmedian(data)

Linear algorithm for finding median of a vector or some part of it.
"""
function linmedian(data; l=1, h=length(data), mode="middle")

    ml, odd = divrem(h, 2)
    mh = ml + 1
    tmpl, tmph = data[ml], data[mh]

    if odd == 0
        while l < ml
            tmpl, tmph = medianof4(data[l], tmpl, tmph, data[h])
        l += 1
        h -= 1
        end

        if mode == "higher" || mode == "high" || mode == "hi" || mode == "h"
            return tmph
        elseif mode == "lower" || mode == "low" || mode == "lo" || mode == "l"
            return tmpl
        else
            return (tmpl + tmph) / 2
        end
    else
        while l < ml
            tmph = medianof3(data[l], tmph, data[h])
            l += 1
            h -= 1
        end 

        return tmph
    end
end

d = rand(0:99, 12000)

@benchmark linmedian($d)

@benchmark median($d)

OK, I did optimize everything I knew how, and I’m not testing for consistency of input and all that, but still, I’m surprised. Am I doing something wrong (I double-checked the results, output is correct)? Did I overlook something? Did I, per chance, find a faster algorithm for median?

I would really like to hear your opinion.

I am getting quite different results:

d = rand(0:99, 12000)

julia> linmedian(d)
32.0

julia> median(d)
49.0
1 Like

OOPS!

That changes things. I’ll look into it …

Yes, the algorithm is flawed. :flushed:
No wonder it is quick :stuck_out_tongue_closed_eyes: (I found the problem, so back to the drawing board …)

Sorry for your time.

4 Likes