I’m trying to write a faster implimentation of exp, and along the way, I noticed that I have no idea how fast ldexp is, and my efforts to benchmark it result in madness.

My code is

function myexp(x::Float64)
# Reduction: x = M*ln2 + ln2*(j/1024) + r
# M, j chosen such that 0<1024, abs(r) < ln2/2048
N = round(Int, x*1477.3197218702985) # 1024/ln2
r = x - N*0.0006769015435155716 # ln2/1024
M = N >> 10
j = N - M<<10
# exp(x) = exp(ln2*M + ln2*j/2^k + r)
small_part = J_TABLE[j+1] * small_exp(r)
return ldexp(small_part, M)
end

Unfortunately, this is about 2x slower than if the last line is replaced with return small_part* (1<<M). When I saw this, everything made sense. I then tried to benchmark the performance of ldexp by itself, and discovered that the function, no matter how I timed it (with BenchmarkTools) only took 15ns. Any ideas what’s happening here?

I’d stick with this. Compare the native code.
But it is possible to do even better if you can guarantee that M belongs to a narrow range (such as 0 <= M < 64):

I find that the ldexp version takes 6.1us, vs 3.0us for the bit-shifted version. For the actuall problem, I don’t have any restrictions on the range of M, but if it helps, I know that small_part is around 1.