Weird ldexp performance

Oscar_Smith · July 21, 2020, 7:04am

I’m trying to write a faster implimentation of exp, and along the way, I noticed that I have no idea how fast ldexp is, and my efforts to benchmark it result in madness.

My code is

function myexp(x::Float64)
    # Reduction: x = M*ln2 + ln2*(j/1024) + r
    # M, j chosen such that 0<1024, abs(r) < ln2/2048
    
    N = round(Int, x*1477.3197218702985)      # 1024/ln2
    r = x - N*0.0006769015435155716           # ln2/1024
    M = N >> 10
    j = N - M<<10
    
    # exp(x) = exp(ln2*M + ln2*j/2^k + r)
    small_part = J_TABLE[j+1] * small_exp(r)
    return ldexp(small_part, M)
end

Unfortunately, this is about 2x slower than if the last line is replaced with return small_part* (1<<M). When I saw this, everything made sense. I then tried to benchmark the performance of ldexp by itself, and discovered that the function, no matter how I timed it (with BenchmarkTools) only took 15ns. Any ideas what’s happening here?

dpsanders · July 21, 2020, 7:17am

Why is 15ns madness? That’s about what I’d expect. Can you give an MWE and show your benchmarking code?

dpsanders · July 21, 2020, 7:18am

Actually I get

julia> x = 1.0; y = 3
3

julia> @btime ldexp($(Ref(x))[], $(Ref(y))[])
  3.272 ns (0 allocations: 0 bytes)
8.0

Elrod · July 21, 2020, 12:16pm

I’d stick with this. Compare the native code.
But it is possible to do even better if you can guarantee that M belongs to a narrow range (such as 0 <= M < 64):

julia> myldexp(x, y) = x * (1 << y)
myldexp (generic function with 1 method)

julia> @code_native debuginfo=:none myldexp(0.125, 3)
        .text
        movl    $1, %eax
        shlxq   %rdi, %rax, %rcx
        xorl    %edx, %edx
        cmpq    $63, %rdi
        cmovaq  %rdx, %rcx
        movq    %rdi, %rsi
        negq    %rsi
        shrxq   %rsi, %rax, %rax
        cmpq    $63, %rsi
        cmovaq  %rdx, %rax
        testq   %rdi, %rdi
        cmovnsq %rcx, %rax
        vcvtsi2sd       %rax, %xmm1, %xmm1
        vmulsd  %xmm0, %xmm1, %xmm0
        retq
        nopl    (%rax,%rax)

julia> using SIMDPirates

julia> myldexp(x, y) = (SIMDPirates.assume(0 ≤ y < 64); x * (1 << y))
myldexp (generic function with 1 method)

julia> @code_native debuginfo=:none myldexp(0.125, 3)
        .text
        movl    $1, %eax
        shlxq   %rdi, %rax, %rax
        vcvtsi2sd       %rax, %xmm1, %xmm1
        vmulsd  %xmm0, %xmm1, %xmm0
        retq
        nopw    %cs:(%rax,%rax)
        nop

But I think LLVM is good at figuring out restricted ranges on its own, so odds are assume is unnecessary.

Oscar_Smith · July 21, 2020, 2:12pm

My bench-marking code is as follows:

x = rand(1000);
@btime myexp.($x)

I find that the ldexp version takes 6.1us, vs 3.0us for the bit-shifted version. For the actuall problem, I don’t have any restrictions on the range of M, but if it helps, I know that small_part is around 1.

Topic		Replies	Views
Small benchmark Performance benchmark	14	2727	November 21, 2018
BetterExp 0.1.0 Package Announcements math	30	2543	September 9, 2020
Performance of Float32 exponential Performance	4	1451	December 21, 2019
Comparing exp() performance on Julia versus numpy Performance	38	1909	August 8, 2022
Arithmetic performance of expression Performance	11	371	October 4, 2022

Weird ldexp performance

Related topics