I have an “it works … why?” question. I hope this is the right place.
I’m trying to learn to write julia code to have as good performance as possible (yes, I’m new with julia). Most of the time, I’m trying to reproduce the performance of the julia libraries (not always successfully, of course). However, in my experiments I managed to find an algorithm for finding median, which outperforms the built-in median function. According to benchmarks, my function outperforms the built-in one 1:10.
Here is the code
using BenchmarkTools
function medianof4(a::Number, b::Number, c::Number, d::Number)
if a > c
a, c = c, a
end
if b > d
b, d = d, b
end
if a > b
if c > d
return a, d
else
return a, c
end
else
if c > d
return b, d
else
return b, c
end
end
end
function medianof3(a::Number, b::Number, c::Number)
if a > c
a, c = c, a
end
if a > b
if b > c
return b
elseif a > c
return c
else
return a
end
else
if b < c
return b
elseif a > c
return a
else
return c
end
end
end
"""
linmedian(data)
Linear algorithm for finding median of a vector or some part of it.
"""
function linmedian(data; l=1, h=length(data), mode="middle")
ml, odd = divrem(h, 2)
mh = ml + 1
tmpl, tmph = data[ml], data[mh]
if odd == 0
while l < ml
tmpl, tmph = medianof4(data[l], tmpl, tmph, data[h])
l += 1
h -= 1
end
if mode == "higher" || mode == "high" || mode == "hi" || mode == "h"
return tmph
elseif mode == "lower" || mode == "low" || mode == "lo" || mode == "l"
return tmpl
else
return (tmpl + tmph) / 2
end
else
while l < ml
tmph = medianof3(data[l], tmph, data[h])
l += 1
h -= 1
end
return tmph
end
end
d = rand(0:99, 12000)
@benchmark linmedian($d)
@benchmark median($d)
OK, I did optimize everything I knew how, and I’m not testing for consistency of input and all that, but still, I’m surprised. Am I doing something wrong (I double-checked the results, output is correct)? Did I overlook something? Did I, per chance, find a faster algorithm for median?
I would really like to hear your opinion.