# Median calculation performing better than expected

I have an “it works … why?” question. I hope this is the right place.

I’m trying to learn to write julia code to have as good performance as possible (yes, I’m new with julia). Most of the time, I’m trying to reproduce the performance of the julia libraries (not always successfully, of course). However, in my experiments I managed to find an algorithm for finding median, which outperforms the built-in median function. According to benchmarks, my function outperforms the built-in one 1:10.

Here is the code

``````using BenchmarkTools

function medianof4(a::Number, b::Number, c::Number, d::Number)

if a > c
a, c = c, a
end
if b > d
b, d = d, b
end
if a > b
if c > d
return a, d
else
return a, c
end
else
if c > d
return b, d
else
return b, c
end
end
end

function medianof3(a::Number, b::Number, c::Number)

if a > c
a, c = c, a
end
if a > b
if b > c
return b
elseif a > c
return c
else
return a
end
else
if b < c
return b
elseif a > c
return a
else
return c
end
end
end

"""
linmedian(data)

Linear algorithm for finding median of a vector or some part of it.
"""
function linmedian(data; l=1, h=length(data), mode="middle")

ml, odd = divrem(h, 2)
mh = ml + 1
tmpl, tmph = data[ml], data[mh]

if odd == 0
while l < ml
tmpl, tmph = medianof4(data[l], tmpl, tmph, data[h])
l += 1
h -= 1
end

if mode == "higher" || mode == "high" || mode == "hi" || mode == "h"
return tmph
elseif mode == "lower" || mode == "low" || mode == "lo" || mode == "l"
return tmpl
else
return (tmpl + tmph) / 2
end
else
while l < ml
tmph = medianof3(data[l], tmph, data[h])
l += 1
h -= 1
end

return tmph
end
end

d = rand(0:99, 12000)

@benchmark linmedian(\$d)

@benchmark median(\$d)

``````

OK, I did optimize everything I knew how, and I’m not testing for consistency of input and all that, but still, I’m surprised. Am I doing something wrong (I double-checked the results, output is correct)? Did I overlook something? Did I, per chance, find a faster algorithm for median?

I would really like to hear your opinion.

I am getting quite different results:

``````d = rand(0:99, 12000)

julia> linmedian(d)
32.0

julia> median(d)
49.0
``````
1 Like

OOPS!

That changes things. I’ll look into it …

Yes, the algorithm is flawed. No wonder it is quick (I found the problem, so back to the drawing board …)