Searchsortedlast performance

roflmaostc · December 7, 2021, 9:01pm

Can you post the latest version of the code? With the improved searchsortedlast?

I guess many people (including me) want to give a shot improving that even more

snowgum · December 7, 2021, 9:58pm

Thanks for the offer! Here is the latest version:

function weno4_opt(xs, xp, fp)
    Ngrid = size(xp)[1]
    ε = 1e-6
    fs = similar(xs)
    left = fp[1]
    right = fp[end]
    prevβidx = -1
    β2 = 0.0
    β3 = 0.0
    @inbounds for (idx, x) in enumerate(xs)
        if x < xp[1]
            fs[idx] = left
            continue
        elseif x > xp[end]
            fs[idx] = right
            continue
        else
            i = 2
            while @inbounds x > xp[i]
                i += 1
            end
            i -= 1
        end

        if i == Ngrid
            i -= 1
        end

        if i == 1
            xi = xp[i]
            xip = xp[i+1]
            xipp = xp[i+2]

            hi = xip - xi
            hip = xipp - xip

            yi = fp[i]
            yip = fp[i+1]
            yipp = fp[i+2]

            q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
            q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
            q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)

            fs[idx] = q3
            continue
        elseif i == Ngrid - 1
            xim = xp[i-1]
            xi = xp[i]
            xip = xp[i+1]

            him = xi - xim
            hi = xip - xi

            yim = fp[i-1]
            yi = fp[i]
            yip = fp[i+1]

            q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
            q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
            q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)

            fs[idx] = q2
            continue
        end

        xim = xp[i-1]
        xi = xp[i]
        xip = xp[i+1]
        xipp = xp[i+2]

        him = xi - xim
        hi = xip - xi
        hip = xipp - xip

        yim = fp[i-1]
        yi = fp[i]
        yip = fp[i+1]
        yipp = fp[i+2]

        q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
        q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
        q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)

        q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
        q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
        q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)

        if i != prevβidx
            H = him + hi + hip
            yyim = - ((2*him + hi)*H + him*(him + hi)) / (him*(him + hi)*H) * yim
            yyim += ((him + hi)*H) / (him*hi*(hi + hip)) * yi
            yyim -= (him*H) / ((him + hi)*hi*hip) * yip
            yyim += (him*(him + hi)) / ((hi + hip)*hip*H) * yipp

            yyi = - (hi*(hi + hip)) / (him*(him + hi)*H) * yim
            yyi += (hi*(hi + hip) - him*(2*hi + hip)) / (him*hi*(hi + hip)) * yi
            yyi += (him*(hi + hip)) / ((him + hi)*hi*hip) * yip
            yyi -= (him*hi) / ((hi + hip)*hip*H) * yipp

            yyip = (hi*hip) / (him*(him + hi)*H) * yim
            yyip -= (hip*(him + hi)) / (him*hi*(hi + hip)) * yi
            yyip += ((him + 2*hi)*hip - (him + hi)*hi) / ((him + hi)*hi*hip) * yip
            yyip += ((him + hi)*hi) / ((hi + hip)*hip*H) * yipp

            yyipp = - ((hi + hip)*hip) / (him*(him + hi)*H) * yim
            yyipp += (hip*H) / (him*hi*(hi + hip)) * yi
            yyipp -= ((hi + hip) * H) / ((him + hi)*hi*hip) * yip
            yyipp += ((2*hip + hi)*H + hip*(hi + hip)) / ((hi + hip)*hip*H) * yipp

            β2 = (hi + hip)^2 * (abs(yyip - yyi) / hi - abs(yyi - yyim) / him)^2
            β3 = (him + hi)^2 * (abs(yyipp - yyip) / hip - abs(yyip - yyi) / hi)^2

            prevβidx = i
        end
        γ2 = - (x - xipp) / (xipp - xim)
        γ3 = (x - xim) / (xipp - xim)

        α2 = γ2 / (ε + β2)
        α3 = γ3 / (ε + β3)

        ω2 = α2 / (α2 + α3)
        ω3 = α3 / (α2 + α3)

        fs[idx] = ω2*q2 + ω3*q3
    end
    return fs
end

jling · December 7, 2021, 10:34pm

now I wonder if we should improve this directly…

lmiq · December 7, 2021, 10:42pm

Both the numpy and the Julia algorithms use bissections, and that in general should be faster than the simple search I’ve written. The numpy version has some branches which may deal with this specific data better. There is some investigation to be done there. Still a minimal example confirming in which cases one is faster than the other would be needed, maybe what happens here is not a regular pattern.

DNF · December 7, 2021, 11:00pm

This is actually quite typical. Yes, Base functions are well-written and quite optimized, but they are optimized for the general case. If you have an idea for how to address your specific problem, it is likely that you can improve on Base. This is a feature of the language.

jebej · December 7, 2021, 11:14pm

I think the other point here was that since the index i found is always increasing in the enumeration of xs, you don’t need to always start searching from i=2. I think you can simply move the statement out of the loop.

mkitti · December 7, 2021, 11:35pm

The custom version of searchsortedlast is basically findlast with a predicate function. Perhaps you just picked the wrong search function as each is optimized for different use cases. For the size of the arrays in question here, the midpoint bisection search algorithm is not advantageous.

goerch · December 7, 2021, 11:37pm

And which would be better?

mkitti · December 7, 2021, 11:46pm

It depends on the size of the array being searched. How many interpolation knots do we expect?

goerch · December 7, 2021, 11:51pm

Sorry for not following: I understand there are at least linear search, binary search and interpolation search. Which algorithm are you proposing?

mkitti · December 8, 2021, 12:37am

I’m not proposing anything new. Op wanted to know why searchsortedlast is not optimal in this case, and I’m suggesting the linear search in findlast might have been better, which is analogous to the custom loop here. Those are both functions in Base.

Reflecting on the thread as a whole, it’s clear that Python has created a perception that you have to use standard library code to be optimal and that rolling your own for loop will be slow. Here we see there is nothing special about Base or the standard library. It’s just Julia whether you write it yourself or use an existing function.

fedoroff · December 8, 2021, 7:25am

Thank you for the code.

Before this discussion I was not aware of WENO interpolation methods. However, it seems, that they will be very useful in my studies. I hope you will write the corresponding package.

P.S. Does anyone aware of existing weno interpolation implementations in Julia. I only managed to find weno5 function from Kinetic.jl.

DNF · December 8, 2021, 8:22am

Another point here is that numba is jit-compiled using llvm, just like Julia. For straightforward code like this, one shouldn’t necessarily expect Julia to be faster.

lawless-m · December 8, 2021, 11:17am

I broke the code out into functions because I couldn’t see the control flow and something wonderful happend - runtime was 62% of the weno4_opt version

julia>  # show the results are identical because I still don't believe it :)
julia> weno4_opt(xi, xx, yy) - weno4_M(xi, xx, yy) |> unique
1-element Vector{Float64}:
 0.0


julia> @benchmark weno4_M(xi, xx, yy)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  293.163 μs …  2.371 ms  ┊ GC (min … max): 0.00% … 83.72%
 Time  (median):     294.908 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   298.795 μs ± 46.086 μs  ┊ GC (mean ± σ):  0.59% ±  3.13%

  ▄██▇▇▅▃▃▃▂▁                                        ▁▁▁       ▂
  ███████████▇▇▄▃▃▃▃▄▅▄▅▄▅▂▄▄▃▄▃▂▂▄▄▅▄▄▅▃▅▂▄▄▂▄▃▄▂▆█████▇██▇█▇ █
  293 μs        Histogram: log(frequency) by time       329 μs <

 Memory estimate: 78.23 KiB, allocs estimate: 2.


julia> @benchmark weno4_opt(xi, xx, yy)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  478.019 μs …  1.513 ms  ┊ GC (min … max): 0.00% … 64.02%
 Time  (median):     478.699 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   482.409 μs ± 41.598 μs  ┊ GC (mean ± σ):  0.34% ±  2.71%

  ██▃  ▃▄▃▁                                          ▁ ▁       ▂
  ███▅█████▇▆▆▃▅▁▃▃▃▁▃▁▃▁▁▃▁▄▁▁▃▁▃▄▁▃▃▄▄▄▄▁▄▁▁▃▃▃▁▄█▇█████▇█▇█ █
  478 μs        Histogram: log(frequency) by time       513 μs <

 Memory estimate: 78.23 KiB, allocs estimate: 2.


 function iIsOne(x, xp, fp)
    +( 
        fp[1] * ((x - xp[2]) * (x - xp[3])) / ((xp[2] - xp[1]) * (xp[3] - xp[1])),
       -fp[2] * ((x - xp[1]) * (x - xp[3])) / ((xp[2] - xp[1]) * (xp[3] - xp[2])),
        fp[3] * ((x - xp[1]) * (x - xp[2])) / ((xp[3] - xp[1]) * (xp[3] - xp[2]))
      )
 end

 function iIsNgridMinusOne(i, x, xp, fp)
    +(
      fp[i-1] * ((x - xp[i])  * (x - xp[i+1])) / ((xp[i]   - xp[i-1]) * (xp[i+1] - xp[i-1])),
     -fp[i]  * ((x - xp[i-1]) * (x - xp[i+1])) / ((xp[i]   - xp[i-1]) * (xp[i+1] - xp[i]  )),
      fp[i+1] * ((x - xp[i-1]) * (x - xp[i]  )) / ((xp[i+1] - xp[i-1]) * (xp[i+1] - xp[i]  ))
      )
end

function iIsNotPrevβidx(i, xp, fp) 
    xim = xp[i - 1]
    xi = xp[i]
    xip = xp[i + 1]
    xipp = xp[i + 2]
    
    yim = fp[i-1]
    yi = fp[i]
    yip = fp[i+1]
    yipp = fp[i+2] 
  
    yyim = - ((2 * (xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip)) + (xi - xim) * ((xi - xim) + (xip - xi))) / ((xi - xim) * ((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yim
    yyim += (((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) / ((xi - xim) * (xip - xi) * ((xip - xi) + (xipp - xip))) * yi
    yyim -= ((xi - xim) * ((xi - xim) + (xip - xi) + (xipp - xip))) / (((xi - xim) + (xip - xi)) * (xip - xi) * (xipp - xip)) * yip
    yyim += ((xi - xim) * ((xi - xim) + (xip - xi))) / (((xip - xi) + (xipp - xip)) * (xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yipp
  
    yyi = - ((xip - xi) * ((xip - xi) + (xipp - xip))) / ((xi - xim) * ((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yim
    yyi += ((xip - xi) * ((xip - xi) + (xipp - xip)) - (xi - xim) * (2 * (xip - xi) + (xipp - xip))) / ((xi - xim) * (xip - xi) * ((xip - xi) + (xipp - xip))) * yi
    yyi += ((xi - xim) * ((xip - xi) + (xipp - xip))) / (((xi - xim) + (xip - xi)) * (xip - xi) * (xipp - xip)) * yip
    yyi -= ((xi - xim) * (xip - xi)) / (((xip - xi) + (xipp - xip)) * (xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yipp
  
    yyip = ((xip - xi) * (xipp - xip)) / ((xi - xim) * ((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yim
    yyip -= ((xipp - xip) * ((xi - xim) + (xip - xi))) / ((xi - xim) * (xip - xi) * ((xip - xi) + (xipp - xip))) * yi
    yyip += (((xi - xim) + 2 * (xip - xi)) * (xipp - xip) - ((xi - xim) + (xip - xi)) * (xip - xi)) / (((xi - xim) + (xip - xi)) * (xip - xi) * (xipp - xip)) * yip
    yyip += (((xi - xim) + (xip - xi)) * (xip - xi)) / (((xip - xi) + (xipp - xip)) * (xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yipp
  
    yyipp = - (((xip - xi) + (xipp - xip)) * (xipp - xip)) / ((xi - xim) * ((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yim
    yyipp += ((xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) / ((xi - xim) * (xip - xi) * ((xip - xi) + (xipp - xip))) * yi
    yyipp -= (((xip - xi) + (xipp - xip)) * ((xi - xim) + (xip - xi) + (xipp - xip))) / (((xi - xim) + (xip - xi)) * (xip - xi) * (xipp - xip)) * yip
    yyipp += ((2 * (xipp - xip) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip)) + (xipp - xip) * ((xip - xi) + (xipp - xip))) / (((xip - xi) + (xipp - xip)) * (xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yipp
  
    β2 = ((xip - xi) + (xipp - xip))^2 * (abs(yyip - yyi) / (xip - xi) - abs(yyi - yyim) / (xi - xim))^2
    β3 = ((xi - xim) + (xip - xi))^2 * (abs(yyipp - yyip) / (xipp - xip) - abs(yyip - yyi) / (xip - xi))^2
    β2, β3
end

function q2q3(i, x, xp, fp)
    xim = xp[i-1]
    xi = xp[i]
    xip = xp[i+1]
    xipp = xp[i+2]

    him = xi - xim
    hi = xip - xi
    hip = xipp - xip

    yim = fp[i-1]
    yi = fp[i]
    yip = fp[i+1]
    yipp = fp[i+2]

    q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
    q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
    q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)

    q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
    q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
    q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
    q2, q3
 end

 function ω2ω3(i, x, xp, β2, β3)
    ε = 1e-6
    γ2 = - (x - xp[i+2]) / (xp[i+2] - xp[i-1])
    γ3 =   (x - xp[i-1]) / (xp[i+2] - xp[i-1])

    α2 = γ2 / (ε + β2)
    α3 = γ3 / (ε + β3)

    α2 / (α2 + α3),  α3 / (α2 + α3)
 end

 function weno4_M(xs, xp, fp)
    Ngrid = size(xp)[1]
    fs = similar(xs)
    left = fp[1]
    right = fp[end]
    prevβidx = -1
    β2 = 0.0
    β3 = 0.0
    @inbounds for (idx, x) in enumerate(xs)
        if x < xp[1]
            fs[idx] = left
            continue
        elseif x > xp[end]
            fs[idx] = right
            continue
        else
            i = 2
            while @inbounds x > xp[i]
                i += 1
            end
            i -= 1
        end

        if i == Ngrid
            i -= 1
        end

        if i == 1
            fs[idx] = iIsOne(x, xp, fp)
            continue
        elseif i == Ngrid - 1
            fs[idx] = iIsNgridMinusOne(i, x, xp, fp)
            continue
        end

        q2, q3 = q2q3(i, x, xp, fp)

        if i != prevβidx
            β2, β3 = iIsNotPrevβidx(i, xp, fp)
            prevβidx = i
        end

        ω2, ω3 = ω2ω3(i, x, xp, β2, β3)

        fs[idx] = ω2 * q2 + ω3 * q3
    end
    return fs
end

mikmoore · December 9, 2021, 12:41am

A significant fraction of the runtime of searchsortedlast is spent performing isless comparisons. Note that isless has some extra machinery to handle NaN since we’re using floating-point arguments.

I see float performance improved significantly by using < instead of the default isless. THIS SUBSTITUTION WILL PRODUCE INCORRECT RESULTS IF NaN APPEARS IN ANY OF YOUR INPUTS.

using BenchmarkTools
v = cumsum(rand(1000)); v ./= last(v);

@benchmark searchsortedlast($v,z) setup=(z=rand())
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  13.627 ns … 26.453 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     16.032 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   16.049 ns ±  0.351 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

               ▁ ▁ ▁                ▂ ▅ ▇▇ █ █ ▇▆ ▃           ▂
  ▃▁▁▅▁▁▃▁▃▁▆▁██▁█▁█▁█▅▁▁▁▃▁▅▆▁▆▁▇▇▁█▁█▁██▁█▁█▁██▁█▁█▁▅▇▁▇▁█▆ █
  13.6 ns      Histogram: log(frequency) by time        17 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@benchmark searchsortedlast($v,z;lt=<) setup=(z=rand())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):   7.908 ns … 24.525 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.711 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.643 ns ±  0.888 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                          ▃ ▅    ▅▁     █    ▂ ▅    ▆
  ▂▂▁▁▁▂▂▁▁▁▂▁▃▃▁▂▂▄▁█▃▄▁▃█▁█▃▅▁▃██▁▃▃▁▅█▅▁▃▃█▁█▃▁▂▃█▁█▂▂▁▄▇▃ ▃
  7.91 ns         Histogram: frequency by time        12.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Bardo · December 9, 2021, 8:53am

See Fastest data structure for a priority queue - General Usage / Performance - JuliaLang

Interestingly, the break-even for binary search appears close to the MWE’s vector length 20 for xp and yp:

n = 20
  271.237 μs (2 allocations: 78.23 KiB)
  274.209 μs (2 allocations: 78.23 KiB)
n = 200
  837.772 μs (2 allocations: 78.23 KiB)
  356.019 μs (2 allocations: 78.23 KiB)
n = 2000
  4.551 ms (2 allocations: 78.23 KiB)
  523.962 μs (2 allocations: 78.23 KiB)

function weno4_opt(xs, xp, fp)
    Ngrid = size(xp)[1]
    ε = 1e-6
    fs = similar(xs)
    left = fp[1]
    right = fp[end]
    prevβidx = -1
    β2 = 0.0
    β3 = 0.0
    @inbounds for (idx, x) in enumerate(xs)
        if x < xp[1]
            fs[idx] = left
            continue
        elseif x > xp[end]
            fs[idx] = right
            continue
        else
            i = 2
            while @inbounds x > xp[i]
                i += 1
            end
            i -= 1
        end
        
        if i == Ngrid
            i -= 1
        end
        
        if i == 1
            xi = xp[i]
            xip = xp[i+1]
            xipp = xp[i+2]
            
            hi = xip - xi
            hip = xipp - xip
            
            yi = fp[i]
            yip = fp[i+1]
            yipp = fp[i+2]
            
            q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
            q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
            q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
            
            fs[idx] = q3
            continue
        elseif i == Ngrid - 1
            xim = xp[i-1]
            xi = xp[i]
            xip = xp[i+1]
            
            him = xi - xim
            hi = xip - xi
            
            yim = fp[i-1]
            yi = fp[i]
            yip = fp[i+1]
            
            q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
            q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
            q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
            
            fs[idx] = q2
            continue
        end
        
        xim = xp[i-1]
        xi = xp[i]
        xip = xp[i+1]
        xipp = xp[i+2]
        
        him = xi - xim
        hi = xip - xi
        hip = xipp - xip
        
        yim = fp[i-1]
        yi = fp[i]
        yip = fp[i+1]
        yipp = fp[i+2]
        
        q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
        q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
        q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
        
        q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
        q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
        q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
        
        if i != prevβidx
            H = him + hi + hip
            yyim = - ((2*him + hi)*H + him*(him + hi)) / (him*(him + hi)*H) * yim
            yyim += ((him + hi)*H) / (him*hi*(hi + hip)) * yi
            yyim -= (him*H) / ((him + hi)*hi*hip) * yip
            yyim += (him*(him + hi)) / ((hi + hip)*hip*H) * yipp
            
            yyi = - (hi*(hi + hip)) / (him*(him + hi)*H) * yim
            yyi += (hi*(hi + hip) - him*(2*hi + hip)) / (him*hi*(hi + hip)) * yi
            yyi += (him*(hi + hip)) / ((him + hi)*hi*hip) * yip
            yyi -= (him*hi) / ((hi + hip)*hip*H) * yipp
            
            yyip = (hi*hip) / (him*(him + hi)*H) * yim
            yyip -= (hip*(him + hi)) / (him*hi*(hi + hip)) * yi
            yyip += ((him + 2*hi)*hip - (him + hi)*hi) / ((him + hi)*hi*hip) * yip
            yyip += ((him + hi)*hi) / ((hi + hip)*hip*H) * yipp
            
            yyipp = - ((hi + hip)*hip) / (him*(him + hi)*H) * yim
            yyipp += (hip*H) / (him*hi*(hi + hip)) * yi
            yyipp -= ((hi + hip) * H) / ((him + hi)*hi*hip) * yip
            yyipp += ((2*hip + hi)*H + hip*(hi + hip)) / ((hi + hip)*hip*H) * yipp
            
            β2 = (hi + hip)^2 * (abs(yyip - yyi) / hi - abs(yyi - yyim) / him)^2
            β3 = (him + hi)^2 * (abs(yyipp - yyip) / hip - abs(yyip - yyi) / hi)^2
            
            prevβidx = i
        end
        γ2 = - (x - xipp) / (xipp - xim)
        γ3 = (x - xim) / (xipp - xim)
        
        α2 = γ2 / (ε + β2)
        α3 = γ3 / (ε + β3)
        
        ω2 = α2 / (α2 + α3)
        ω3 = α3 / (α2 + α3)
        
        fs[idx] = ω2*q2 + ω3*q3
    end
    return fs
end

function weno4_opt_binsearch(xs, xp, fp)
    Ngrid = size(xp)[1]
    ε = 1e-6
    fs = similar(xs)
    left = fp[1]
    right = fp[end]
    prevβidx = -1
    β2 = 0.0
    β3 = 0.0
    @inbounds for (idx, x) in enumerate(xs)
        if x < xp[1]
            fs[idx] = left
            continue
        elseif x > xp[end]
            fs[idx] = right
            continue
        else
            ### linear search
            #i = 2
            #while @inbounds x > xp[i]
            #    i += 1
            #end
            #i -= 1
            ### binary search
            ileft = 2
            i = Ngrid - 1    
            while ileft <= i
                mid = (ileft + i) >> 1
                if @inbounds x < xp[mid]
                    i = mid - 1
                else
                    ileft = mid + 1
                end
            end
        end
        
        if i == Ngrid
            i -= 1
        end
        
        if i == 1
            xi = xp[i]
            xip = xp[i+1]
            xipp = xp[i+2]
            
            hi = xip - xi
            hip = xipp - xip
            
            yi = fp[i]
            yip = fp[i+1]
            yipp = fp[i+2]
            
            q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
            q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
            q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
            
            fs[idx] = q3
            continue
        elseif i == Ngrid - 1
            xim = xp[i-1]
            xi = xp[i]
            xip = xp[i+1]
            
            him = xi - xim
            hi = xip - xi
            
            yim = fp[i-1]
            yi = fp[i]
            yip = fp[i+1]
            
            q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
            q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
            q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
            
            fs[idx] = q2
            continue
        end
        
        xim = xp[i-1]
        xi = xp[i]
        xip = xp[i+1]
        xipp = xp[i+2]
        
        him = xi - xim
        hi = xip - xi
        hip = xipp - xip
        
        yim = fp[i-1]
        yi = fp[i]
        yip = fp[i+1]
        yipp = fp[i+2]
        
        q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
        q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
        q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
        
        q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
        q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
        q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
        
        if i != prevβidx
            H = him + hi + hip
            yyim = - ((2*him + hi)*H + him*(him + hi)) / (him*(him + hi)*H) * yim
            yyim += ((him + hi)*H) / (him*hi*(hi + hip)) * yi
            yyim -= (him*H) / ((him + hi)*hi*hip) * yip
            yyim += (him*(him + hi)) / ((hi + hip)*hip*H) * yipp
            
            yyi = - (hi*(hi + hip)) / (him*(him + hi)*H) * yim
            yyi += (hi*(hi + hip) - him*(2*hi + hip)) / (him*hi*(hi + hip)) * yi
            yyi += (him*(hi + hip)) / ((him + hi)*hi*hip) * yip
            yyi -= (him*hi) / ((hi + hip)*hip*H) * yipp
            
            yyip = (hi*hip) / (him*(him + hi)*H) * yim
            yyip -= (hip*(him + hi)) / (him*hi*(hi + hip)) * yi
            yyip += ((him + 2*hi)*hip - (him + hi)*hi) / ((him + hi)*hi*hip) * yip
            yyip += ((him + hi)*hi) / ((hi + hip)*hip*H) * yipp
            
            yyipp = - ((hi + hip)*hip) / (him*(him + hi)*H) * yim
            yyipp += (hip*H) / (him*hi*(hi + hip)) * yi
            yyipp -= ((hi + hip) * H) / ((him + hi)*hi*hip) * yip
            yyipp += ((2*hip + hi)*H + hip*(hi + hip)) / ((hi + hip)*hip*H) * yipp
            
            β2 = (hi + hip)^2 * (abs(yyip - yyi) / hi - abs(yyi - yyim) / him)^2
            β3 = (him + hi)^2 * (abs(yyipp - yyip) / hip - abs(yyip - yyi) / hi)^2
            
            prevβidx = i
        end
        γ2 = - (x - xipp) / (xipp - xim)
        γ3 = (x - xim) / (xipp - xim)
        
        α2 = γ2 / (ε + β2)
        α3 = γ3 / (ε + β3)
        
        ω2 = α2 / (α2 + α3)
        ω3 = α3 / (α2 + α3)
        
        fs[idx] = ω2*q2 + ω3*q3
    end
    return fs
end

using BenchmarkTools

n = 20
xp = sort(randn(n)) # mimicking the OP's data
yp = -sort(-rand(n))
x = collect(LinRange(minimum(xp), maximum(xp), 10001))

@btime weno4_opt(x, xp, yp)  evals=1000
@btime weno4_opt_binsearch(x, xp, yp)  evals=1000

snowgum · December 9, 2021, 10:33am

Thanks for the suggestion. In my tests, using the same length(xi)=10001, your version is actually slightly slower:

julia> @benchmark weno4_opt(xi, xx, yy) evals=10
BenchmarkTools.Trial: 2777 samples with 10 evaluations.
 Range (min … max):  135.900 μs … 777.403 μs  ┊ GC (min … max): 0.00% … 73.59%
 Time  (median):     169.912 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   179.509 μs ±  65.292 μs  ┊ GC (mean ± σ):  4.77% ±  9.85%

  ▁ ▇▇█▅▄▂▁▁                                                    ▁
  ████████████▆▆▅▅▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▅▅▅▇ █
  136 μs        Histogram: log(frequency) by time        635 μs <

 Memory estimate: 78.27 KiB, allocs estimate: 2.

julia> @benchmark weno4_M(xi, xx, yy) evals=10
BenchmarkTools.Trial: 2479 samples with 10 evaluations.
 Range (min … max):  160.647 μs … 856.511 μs  ┊ GC (min … max): 0.00% … 78.87%
 Time  (median):     191.992 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   201.213 μs ±  65.491 μs  ┊ GC (mean ± σ):  4.24% ±  9.47%

  ▁ █▇█▅▄▂▁                                                     ▁
  ███████████▆▅▅▁▃▅▃▁▁▁▃▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▅▅▇ █
  161 μs        Histogram: log(frequency) by time        654 μs <

 Memory estimate: 78.27 KiB, allocs estimate: 2.

lawless-m · December 9, 2021, 10:39am

That’s what I expected to happen. Hence my disbelief.

What I have thought about is on various iterations it does various combinations of things like

 xp[i] -  xp[i - 1]
 xp[i + 1] - xp[i]

so when i = 5 that’s

 xp[5] -  xp[4] # xi - xim
 xp[6] - xp[5]  # xip - xi

and when i = 6 that becomes

 xp[6] -  xp[5] # xi - xim
 xp[7] - xp[6] # xip - xi

So perhaps we can trade space for time and pre calculate all of the xp[i] - xp[i - 1] using vector math

Anyway, that was something to post after I had tried it

snowgum · December 9, 2021, 12:35pm

Thanks for the suggestion. The binary search seems a much more solid choice for the general case.

snowgum · December 9, 2021, 12:43pm

My plan is to submit a PR to Interpolations.jl, which I think is more useful to the community than a standalone package. This effort was to get a working version that I could validate against the python version. Under Interpolations.jl, the idea is to pre-calculate the coefficients for all intervals between the nodes and then build an interpolant function that will apply the formula to any value to be interpolated. Here WENO will be similar in application to the already supported monotonic interpolation (which, by the way, in Interpolations.jl makes use of searchsortedfirst).

I am still getting started in Julia, so will probably need some help to get a clean version that can be used in Interpolations.jl.

Topic		Replies	Views
Comparison benchmarks (numerical) Meta Discussion	7	1043	March 9, 2019
Python vs. Julia ODE Solver Performance python	30	4368	March 3, 2021
Julia Beginner (from Python): Numba outperforms Julia in rewrite. Any tips to improve performance? Performance benchmark , python , tullio , loopvectorization	56	6090	August 18, 2021
Julia multithreading no performance improvement General Usage multithreading	15	872	May 31, 2022
Julia slower than Python to sort and reverse a list of integers Performance	40	2517	April 28, 2023

Searchsortedlast performance

Related topics