Searchsortedlast performance

Can you post the latest version of the code? With the improved searchsortedlast?

I guess many people (including me) want to give a shot improving that even more :smiley:

3 Likes

Thanks for the offer! Here is the latest version:

function weno4_opt(xs, xp, fp)
    Ngrid = size(xp)[1]
    Ξ΅ = 1e-6
    fs = similar(xs)
    left = fp[1]
    right = fp[end]
    prevΞ²idx = -1
    Ξ²2 = 0.0
    Ξ²3 = 0.0
    @inbounds for (idx, x) in enumerate(xs)
        if x < xp[1]
            fs[idx] = left
            continue
        elseif x > xp[end]
            fs[idx] = right
            continue
        else
            i = 2
            while @inbounds x > xp[i]
                i += 1
            end
            i -= 1
        end

        if i == Ngrid
            i -= 1
        end

        if i == 1
            xi = xp[i]
            xip = xp[i+1]
            xipp = xp[i+2]

            hi = xip - xi
            hip = xipp - xip

            yi = fp[i]
            yip = fp[i+1]
            yipp = fp[i+2]

            q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
            q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
            q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)

            fs[idx] = q3
            continue
        elseif i == Ngrid - 1
            xim = xp[i-1]
            xi = xp[i]
            xip = xp[i+1]

            him = xi - xim
            hi = xip - xi

            yim = fp[i-1]
            yi = fp[i]
            yip = fp[i+1]

            q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
            q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
            q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)

            fs[idx] = q2
            continue
        end

        xim = xp[i-1]
        xi = xp[i]
        xip = xp[i+1]
        xipp = xp[i+2]

        him = xi - xim
        hi = xip - xi
        hip = xipp - xip

        yim = fp[i-1]
        yi = fp[i]
        yip = fp[i+1]
        yipp = fp[i+2]

        q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
        q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
        q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)

        q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
        q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
        q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)

        if i != prevΞ²idx
            H = him + hi + hip
            yyim = - ((2*him + hi)*H + him*(him + hi)) / (him*(him + hi)*H) * yim
            yyim += ((him + hi)*H) / (him*hi*(hi + hip)) * yi
            yyim -= (him*H) / ((him + hi)*hi*hip) * yip
            yyim += (him*(him + hi)) / ((hi + hip)*hip*H) * yipp

            yyi = - (hi*(hi + hip)) / (him*(him + hi)*H) * yim
            yyi += (hi*(hi + hip) - him*(2*hi + hip)) / (him*hi*(hi + hip)) * yi
            yyi += (him*(hi + hip)) / ((him + hi)*hi*hip) * yip
            yyi -= (him*hi) / ((hi + hip)*hip*H) * yipp

            yyip = (hi*hip) / (him*(him + hi)*H) * yim
            yyip -= (hip*(him + hi)) / (him*hi*(hi + hip)) * yi
            yyip += ((him + 2*hi)*hip - (him + hi)*hi) / ((him + hi)*hi*hip) * yip
            yyip += ((him + hi)*hi) / ((hi + hip)*hip*H) * yipp

            yyipp = - ((hi + hip)*hip) / (him*(him + hi)*H) * yim
            yyipp += (hip*H) / (him*hi*(hi + hip)) * yi
            yyipp -= ((hi + hip) * H) / ((him + hi)*hi*hip) * yip
            yyipp += ((2*hip + hi)*H + hip*(hi + hip)) / ((hi + hip)*hip*H) * yipp

            Ξ²2 = (hi + hip)^2 * (abs(yyip - yyi) / hi - abs(yyi - yyim) / him)^2
            Ξ²3 = (him + hi)^2 * (abs(yyipp - yyip) / hip - abs(yyip - yyi) / hi)^2

            prevΞ²idx = i
        end
        Ξ³2 = - (x - xipp) / (xipp - xim)
        Ξ³3 = (x - xim) / (xipp - xim)

        Ξ±2 = Ξ³2 / (Ξ΅ + Ξ²2)
        Ξ±3 = Ξ³3 / (Ξ΅ + Ξ²3)

        Ο‰2 = Ξ±2 / (Ξ±2 + Ξ±3)
        Ο‰3 = Ξ±3 / (Ξ±2 + Ξ±3)

        fs[idx] = Ο‰2*q2 + Ο‰3*q3
    end
    return fs
end
1 Like

now I wonder if we should improve this directly…

1 Like

Both the numpy and the Julia algorithms use bissections, and that in general should be faster than the simple search I’ve written. The numpy version has some branches which may deal with this specific data better. There is some investigation to be done there. Still a minimal example confirming in which cases one is faster than the other would be needed, maybe what happens here is not a regular pattern.

1 Like

This is actually quite typical. Yes, Base functions are well-written and quite optimized, but they are optimized for the general case. If you have an idea for how to address your specific problem, it is likely that you can improve on Base. This is a feature of the language.

8 Likes

I think the other point here was that since the index i found is always increasing in the enumeration of xs, you don’t need to always start searching from i=2. I think you can simply move the statement out of the loop.

1 Like

The custom version of searchsortedlast is basically findlast with a predicate function. Perhaps you just picked the wrong search function as each is optimized for different use cases. For the size of the arrays in question here, the midpoint bisection search algorithm is not advantageous.

2 Likes

And which would be better?

It depends on the size of the array being searched. How many interpolation knots do we expect?

Sorry for not following: I understand there are at least linear search, binary search and interpolation search. Which algorithm are you proposing?

I’m not proposing anything new. Op wanted to know why searchsortedlast is not optimal in this case, and I’m suggesting the linear search in findlast might have been better, which is analogous to the custom loop here. Those are both functions in Base.

Reflecting on the thread as a whole, it’s clear that Python has created a perception that you have to use standard library code to be optimal and that rolling your own for loop will be slow. Here we see there is nothing special about Base or the standard library. It’s just Julia whether you write it yourself or use an existing function.

8 Likes

Thank you for the code.

Before this discussion I was not aware of WENO interpolation methods. However, it seems, that they will be very useful in my studies. I hope you will write the corresponding package.

P.S. Does anyone aware of existing weno interpolation implementations in Julia. I only managed to find weno5 function from Kinetic.jl.

Another point here is that numba is jit-compiled using llvm, just like Julia. For straightforward code like this, one shouldn’t necessarily expect Julia to be faster.

1 Like

I broke the code out into functions because I couldn’t see the control flow and something wonderful happend - runtime was 62% of the weno4_opt version

julia>  # show the results are identical because I still don't believe it :)
julia> weno4_opt(xi, xx, yy) - weno4_M(xi, xx, yy) |> unique
1-element Vector{Float64}:
 0.0


julia> @benchmark weno4_M(xi, xx, yy)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  293.163 ΞΌs …  2.371 ms  β”Š GC (min … max): 0.00% … 83.72%
 Time  (median):     294.908 ΞΌs              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   298.795 ΞΌs Β± 46.086 ΞΌs  β”Š GC (mean Β± Οƒ):  0.59% Β±  3.13%

  β–„β–ˆβ–ˆβ–‡β–‡β–…β–ƒβ–ƒβ–ƒβ–‚β–                                        ▁▁▁       β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–„β–ƒβ–ƒβ–ƒβ–ƒβ–„β–…β–„β–…β–„β–…β–‚β–„β–„β–ƒβ–„β–ƒβ–‚β–‚β–„β–„β–…β–„β–„β–…β–ƒβ–…β–‚β–„β–„β–‚β–„β–ƒβ–„β–‚β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–‡β–ˆβ–‡ β–ˆ
  293 ΞΌs        Histogram: log(frequency) by time       329 ΞΌs <

 Memory estimate: 78.23 KiB, allocs estimate: 2.


julia> @benchmark weno4_opt(xi, xx, yy)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  478.019 ΞΌs …  1.513 ms  β”Š GC (min … max): 0.00% … 64.02%
 Time  (median):     478.699 ΞΌs              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   482.409 ΞΌs Β± 41.598 ΞΌs  β”Š GC (mean Β± Οƒ):  0.34% Β±  2.71%

  β–ˆβ–ˆβ–ƒ  ▃▄▃▁                                          ▁ ▁       β–‚
  β–ˆβ–ˆβ–ˆβ–…β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–†β–ƒβ–…β–β–ƒβ–ƒβ–ƒβ–β–ƒβ–β–ƒβ–β–β–ƒβ–β–„β–β–β–ƒβ–β–ƒβ–„β–β–ƒβ–ƒβ–„β–„β–„β–„β–β–„β–β–β–ƒβ–ƒβ–ƒβ–β–„β–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–‡β–ˆ β–ˆ
  478 ΞΌs        Histogram: log(frequency) by time       513 ΞΌs <

 Memory estimate: 78.23 KiB, allocs estimate: 2.



 function iIsOne(x, xp, fp)
    +( 
        fp[1] * ((x - xp[2]) * (x - xp[3])) / ((xp[2] - xp[1]) * (xp[3] - xp[1])),
       -fp[2] * ((x - xp[1]) * (x - xp[3])) / ((xp[2] - xp[1]) * (xp[3] - xp[2])),
        fp[3] * ((x - xp[1]) * (x - xp[2])) / ((xp[3] - xp[1]) * (xp[3] - xp[2]))
      )
 end

 function iIsNgridMinusOne(i, x, xp, fp)
    +(
      fp[i-1] * ((x - xp[i])  * (x - xp[i+1])) / ((xp[i]   - xp[i-1]) * (xp[i+1] - xp[i-1])),
     -fp[i]  * ((x - xp[i-1]) * (x - xp[i+1])) / ((xp[i]   - xp[i-1]) * (xp[i+1] - xp[i]  )),
      fp[i+1] * ((x - xp[i-1]) * (x - xp[i]  )) / ((xp[i+1] - xp[i-1]) * (xp[i+1] - xp[i]  ))
      )
end

function iIsNotPrevΞ²idx(i, xp, fp) 
    xim = xp[i - 1]
    xi = xp[i]
    xip = xp[i + 1]
    xipp = xp[i + 2]
    
    yim = fp[i-1]
    yi = fp[i]
    yip = fp[i+1]
    yipp = fp[i+2] 
  
    yyim = - ((2 * (xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip)) + (xi - xim) * ((xi - xim) + (xip - xi))) / ((xi - xim) * ((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yim
    yyim += (((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) / ((xi - xim) * (xip - xi) * ((xip - xi) + (xipp - xip))) * yi
    yyim -= ((xi - xim) * ((xi - xim) + (xip - xi) + (xipp - xip))) / (((xi - xim) + (xip - xi)) * (xip - xi) * (xipp - xip)) * yip
    yyim += ((xi - xim) * ((xi - xim) + (xip - xi))) / (((xip - xi) + (xipp - xip)) * (xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yipp
  
    yyi = - ((xip - xi) * ((xip - xi) + (xipp - xip))) / ((xi - xim) * ((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yim
    yyi += ((xip - xi) * ((xip - xi) + (xipp - xip)) - (xi - xim) * (2 * (xip - xi) + (xipp - xip))) / ((xi - xim) * (xip - xi) * ((xip - xi) + (xipp - xip))) * yi
    yyi += ((xi - xim) * ((xip - xi) + (xipp - xip))) / (((xi - xim) + (xip - xi)) * (xip - xi) * (xipp - xip)) * yip
    yyi -= ((xi - xim) * (xip - xi)) / (((xip - xi) + (xipp - xip)) * (xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yipp
  
    yyip = ((xip - xi) * (xipp - xip)) / ((xi - xim) * ((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yim
    yyip -= ((xipp - xip) * ((xi - xim) + (xip - xi))) / ((xi - xim) * (xip - xi) * ((xip - xi) + (xipp - xip))) * yi
    yyip += (((xi - xim) + 2 * (xip - xi)) * (xipp - xip) - ((xi - xim) + (xip - xi)) * (xip - xi)) / (((xi - xim) + (xip - xi)) * (xip - xi) * (xipp - xip)) * yip
    yyip += (((xi - xim) + (xip - xi)) * (xip - xi)) / (((xip - xi) + (xipp - xip)) * (xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yipp
  
    yyipp = - (((xip - xi) + (xipp - xip)) * (xipp - xip)) / ((xi - xim) * ((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yim
    yyipp += ((xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) / ((xi - xim) * (xip - xi) * ((xip - xi) + (xipp - xip))) * yi
    yyipp -= (((xip - xi) + (xipp - xip)) * ((xi - xim) + (xip - xi) + (xipp - xip))) / (((xi - xim) + (xip - xi)) * (xip - xi) * (xipp - xip)) * yip
    yyipp += ((2 * (xipp - xip) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip)) + (xipp - xip) * ((xip - xi) + (xipp - xip))) / (((xip - xi) + (xipp - xip)) * (xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yipp
  
    Ξ²2 = ((xip - xi) + (xipp - xip))^2 * (abs(yyip - yyi) / (xip - xi) - abs(yyi - yyim) / (xi - xim))^2
    Ξ²3 = ((xi - xim) + (xip - xi))^2 * (abs(yyipp - yyip) / (xipp - xip) - abs(yyip - yyi) / (xip - xi))^2
    Ξ²2, Ξ²3
end

function q2q3(i, x, xp, fp)
    xim = xp[i-1]
    xi = xp[i]
    xip = xp[i+1]
    xipp = xp[i+2]

    him = xi - xim
    hi = xip - xi
    hip = xipp - xip

    yim = fp[i-1]
    yi = fp[i]
    yip = fp[i+1]
    yipp = fp[i+2]

    q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
    q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
    q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)

    q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
    q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
    q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
    q2, q3
 end

 function Ο‰2Ο‰3(i, x, xp, Ξ²2, Ξ²3)
    Ξ΅ = 1e-6
    Ξ³2 = - (x - xp[i+2]) / (xp[i+2] - xp[i-1])
    Ξ³3 =   (x - xp[i-1]) / (xp[i+2] - xp[i-1])

    Ξ±2 = Ξ³2 / (Ξ΅ + Ξ²2)
    Ξ±3 = Ξ³3 / (Ξ΅ + Ξ²3)

    Ξ±2 / (Ξ±2 + Ξ±3),  Ξ±3 / (Ξ±2 + Ξ±3)
 end

 function weno4_M(xs, xp, fp)
    Ngrid = size(xp)[1]
    fs = similar(xs)
    left = fp[1]
    right = fp[end]
    prevΞ²idx = -1
    Ξ²2 = 0.0
    Ξ²3 = 0.0
    @inbounds for (idx, x) in enumerate(xs)
        if x < xp[1]
            fs[idx] = left
            continue
        elseif x > xp[end]
            fs[idx] = right
            continue
        else
            i = 2
            while @inbounds x > xp[i]
                i += 1
            end
            i -= 1
        end

        if i == Ngrid
            i -= 1
        end

        if i == 1
            fs[idx] = iIsOne(x, xp, fp)
            continue
        elseif i == Ngrid - 1
            fs[idx] = iIsNgridMinusOne(i, x, xp, fp)
            continue
        end

        q2, q3 = q2q3(i, x, xp, fp)

        if i != prevΞ²idx
            Ξ²2, Ξ²3 = iIsNotPrevΞ²idx(i, xp, fp)
            prevΞ²idx = i
        end

        Ο‰2, Ο‰3 = Ο‰2Ο‰3(i, x, xp, Ξ²2, Ξ²3)

        fs[idx] = Ο‰2 * q2 + Ο‰3 * q3
    end
    return fs
end
10 Likes

A significant fraction of the runtime of searchsortedlast is spent performing isless comparisons. Note that isless has some extra machinery to handle NaN since we’re using floating-point arguments.

I see float performance improved significantly by using < instead of the default isless. THIS SUBSTITUTION WILL PRODUCE INCORRECT RESULTS IF NaN APPEARS IN ANY OF YOUR INPUTS.

using BenchmarkTools
v = cumsum(rand(1000)); v ./= last(v);

@benchmark searchsortedlast($v,z) setup=(z=rand())
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  13.627 ns … 26.453 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     16.032 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   16.049 ns Β±  0.351 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

               ▁ ▁ ▁                β–‚ β–… β–‡β–‡ β–ˆ β–ˆ β–‡β–† β–ƒ           β–‚
  β–ƒβ–β–β–…β–β–β–ƒβ–β–ƒβ–β–†β–β–ˆβ–ˆβ–β–ˆβ–β–ˆβ–β–ˆβ–…β–β–β–β–ƒβ–β–…β–†β–β–†β–β–‡β–‡β–β–ˆβ–β–ˆβ–β–ˆβ–ˆβ–β–ˆβ–β–ˆβ–β–ˆβ–ˆβ–β–ˆβ–β–ˆβ–β–…β–‡β–β–‡β–β–ˆβ–† β–ˆ
  13.6 ns      Histogram: log(frequency) by time        17 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@benchmark searchsortedlast($v,z;lt=<) setup=(z=rand())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):   7.908 ns … 24.525 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     10.711 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   10.643 ns Β±  0.888 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

                          β–ƒ β–…    ▅▁     β–ˆ    β–‚ β–…    β–†
  β–‚β–‚β–β–β–β–‚β–‚β–β–β–β–‚β–β–ƒβ–ƒβ–β–‚β–‚β–„β–β–ˆβ–ƒβ–„β–β–ƒβ–ˆβ–β–ˆβ–ƒβ–…β–β–ƒβ–ˆβ–ˆβ–β–ƒβ–ƒβ–β–…β–ˆβ–…β–β–ƒβ–ƒβ–ˆβ–β–ˆβ–ƒβ–β–‚β–ƒβ–ˆβ–β–ˆβ–‚β–‚β–β–„β–‡β–ƒ β–ƒ
  7.91 ns         Histogram: frequency by time        12.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
13 Likes

See Fastest data structure for a priority queue - General Usage / Performance - JuliaLang

Interestingly, the break-even for binary search appears close to the MWE’s vector length 20 for xp and yp:

n = 20
  271.237 ΞΌs (2 allocations: 78.23 KiB)
  274.209 ΞΌs (2 allocations: 78.23 KiB)
n = 200
  837.772 ΞΌs (2 allocations: 78.23 KiB)
  356.019 ΞΌs (2 allocations: 78.23 KiB)
n = 2000
  4.551 ms (2 allocations: 78.23 KiB)
  523.962 ΞΌs (2 allocations: 78.23 KiB)
function weno4_opt(xs, xp, fp)
    Ngrid = size(xp)[1]
    Ξ΅ = 1e-6
    fs = similar(xs)
    left = fp[1]
    right = fp[end]
    prevΞ²idx = -1
    Ξ²2 = 0.0
    Ξ²3 = 0.0
    @inbounds for (idx, x) in enumerate(xs)
        if x < xp[1]
            fs[idx] = left
            continue
        elseif x > xp[end]
            fs[idx] = right
            continue
        else
            i = 2
            while @inbounds x > xp[i]
                i += 1
            end
            i -= 1
        end
        
        if i == Ngrid
            i -= 1
        end
        
        if i == 1
            xi = xp[i]
            xip = xp[i+1]
            xipp = xp[i+2]
            
            hi = xip - xi
            hip = xipp - xip
            
            yi = fp[i]
            yip = fp[i+1]
            yipp = fp[i+2]
            
            q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
            q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
            q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
            
            fs[idx] = q3
            continue
        elseif i == Ngrid - 1
            xim = xp[i-1]
            xi = xp[i]
            xip = xp[i+1]
            
            him = xi - xim
            hi = xip - xi
            
            yim = fp[i-1]
            yi = fp[i]
            yip = fp[i+1]
            
            q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
            q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
            q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
            
            fs[idx] = q2
            continue
        end
        
        xim = xp[i-1]
        xi = xp[i]
        xip = xp[i+1]
        xipp = xp[i+2]
        
        him = xi - xim
        hi = xip - xi
        hip = xipp - xip
        
        yim = fp[i-1]
        yi = fp[i]
        yip = fp[i+1]
        yipp = fp[i+2]
        
        q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
        q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
        q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
        
        q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
        q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
        q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
        
        if i != prevΞ²idx
            H = him + hi + hip
            yyim = - ((2*him + hi)*H + him*(him + hi)) / (him*(him + hi)*H) * yim
            yyim += ((him + hi)*H) / (him*hi*(hi + hip)) * yi
            yyim -= (him*H) / ((him + hi)*hi*hip) * yip
            yyim += (him*(him + hi)) / ((hi + hip)*hip*H) * yipp
            
            yyi = - (hi*(hi + hip)) / (him*(him + hi)*H) * yim
            yyi += (hi*(hi + hip) - him*(2*hi + hip)) / (him*hi*(hi + hip)) * yi
            yyi += (him*(hi + hip)) / ((him + hi)*hi*hip) * yip
            yyi -= (him*hi) / ((hi + hip)*hip*H) * yipp
            
            yyip = (hi*hip) / (him*(him + hi)*H) * yim
            yyip -= (hip*(him + hi)) / (him*hi*(hi + hip)) * yi
            yyip += ((him + 2*hi)*hip - (him + hi)*hi) / ((him + hi)*hi*hip) * yip
            yyip += ((him + hi)*hi) / ((hi + hip)*hip*H) * yipp
            
            yyipp = - ((hi + hip)*hip) / (him*(him + hi)*H) * yim
            yyipp += (hip*H) / (him*hi*(hi + hip)) * yi
            yyipp -= ((hi + hip) * H) / ((him + hi)*hi*hip) * yip
            yyipp += ((2*hip + hi)*H + hip*(hi + hip)) / ((hi + hip)*hip*H) * yipp
            
            Ξ²2 = (hi + hip)^2 * (abs(yyip - yyi) / hi - abs(yyi - yyim) / him)^2
            Ξ²3 = (him + hi)^2 * (abs(yyipp - yyip) / hip - abs(yyip - yyi) / hi)^2
            
            prevΞ²idx = i
        end
        Ξ³2 = - (x - xipp) / (xipp - xim)
        Ξ³3 = (x - xim) / (xipp - xim)
        
        Ξ±2 = Ξ³2 / (Ξ΅ + Ξ²2)
        Ξ±3 = Ξ³3 / (Ξ΅ + Ξ²3)
        
        Ο‰2 = Ξ±2 / (Ξ±2 + Ξ±3)
        Ο‰3 = Ξ±3 / (Ξ±2 + Ξ±3)
        
        fs[idx] = Ο‰2*q2 + Ο‰3*q3
    end
    return fs
end

function weno4_opt_binsearch(xs, xp, fp)
    Ngrid = size(xp)[1]
    Ξ΅ = 1e-6
    fs = similar(xs)
    left = fp[1]
    right = fp[end]
    prevΞ²idx = -1
    Ξ²2 = 0.0
    Ξ²3 = 0.0
    @inbounds for (idx, x) in enumerate(xs)
        if x < xp[1]
            fs[idx] = left
            continue
        elseif x > xp[end]
            fs[idx] = right
            continue
        else
            ### linear search
            #i = 2
            #while @inbounds x > xp[i]
            #    i += 1
            #end
            #i -= 1
            ### binary search
            ileft = 2
            i = Ngrid - 1    
            while ileft <= i
                mid = (ileft + i) >> 1
                if @inbounds x < xp[mid]
                    i = mid - 1
                else
                    ileft = mid + 1
                end
            end
        end
        
        if i == Ngrid
            i -= 1
        end
        
        if i == 1
            xi = xp[i]
            xip = xp[i+1]
            xipp = xp[i+2]
            
            hi = xip - xi
            hip = xipp - xip
            
            yi = fp[i]
            yip = fp[i+1]
            yipp = fp[i+2]
            
            q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
            q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
            q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
            
            fs[idx] = q3
            continue
        elseif i == Ngrid - 1
            xim = xp[i-1]
            xi = xp[i]
            xip = xp[i+1]
            
            him = xi - xim
            hi = xip - xi
            
            yim = fp[i-1]
            yi = fp[i]
            yip = fp[i+1]
            
            q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
            q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
            q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
            
            fs[idx] = q2
            continue
        end
        
        xim = xp[i-1]
        xi = xp[i]
        xip = xp[i+1]
        xipp = xp[i+2]
        
        him = xi - xim
        hi = xip - xi
        hip = xipp - xip
        
        yim = fp[i-1]
        yi = fp[i]
        yip = fp[i+1]
        yipp = fp[i+2]
        
        q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
        q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
        q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
        
        q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
        q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
        q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
        
        if i != prevΞ²idx
            H = him + hi + hip
            yyim = - ((2*him + hi)*H + him*(him + hi)) / (him*(him + hi)*H) * yim
            yyim += ((him + hi)*H) / (him*hi*(hi + hip)) * yi
            yyim -= (him*H) / ((him + hi)*hi*hip) * yip
            yyim += (him*(him + hi)) / ((hi + hip)*hip*H) * yipp
            
            yyi = - (hi*(hi + hip)) / (him*(him + hi)*H) * yim
            yyi += (hi*(hi + hip) - him*(2*hi + hip)) / (him*hi*(hi + hip)) * yi
            yyi += (him*(hi + hip)) / ((him + hi)*hi*hip) * yip
            yyi -= (him*hi) / ((hi + hip)*hip*H) * yipp
            
            yyip = (hi*hip) / (him*(him + hi)*H) * yim
            yyip -= (hip*(him + hi)) / (him*hi*(hi + hip)) * yi
            yyip += ((him + 2*hi)*hip - (him + hi)*hi) / ((him + hi)*hi*hip) * yip
            yyip += ((him + hi)*hi) / ((hi + hip)*hip*H) * yipp
            
            yyipp = - ((hi + hip)*hip) / (him*(him + hi)*H) * yim
            yyipp += (hip*H) / (him*hi*(hi + hip)) * yi
            yyipp -= ((hi + hip) * H) / ((him + hi)*hi*hip) * yip
            yyipp += ((2*hip + hi)*H + hip*(hi + hip)) / ((hi + hip)*hip*H) * yipp
            
            Ξ²2 = (hi + hip)^2 * (abs(yyip - yyi) / hi - abs(yyi - yyim) / him)^2
            Ξ²3 = (him + hi)^2 * (abs(yyipp - yyip) / hip - abs(yyip - yyi) / hi)^2
            
            prevΞ²idx = i
        end
        Ξ³2 = - (x - xipp) / (xipp - xim)
        Ξ³3 = (x - xim) / (xipp - xim)
        
        Ξ±2 = Ξ³2 / (Ξ΅ + Ξ²2)
        Ξ±3 = Ξ³3 / (Ξ΅ + Ξ²3)
        
        Ο‰2 = Ξ±2 / (Ξ±2 + Ξ±3)
        Ο‰3 = Ξ±3 / (Ξ±2 + Ξ±3)
        
        fs[idx] = Ο‰2*q2 + Ο‰3*q3
    end
    return fs
end

using BenchmarkTools

n = 20
xp = sort(randn(n)) # mimicking the OP's data
yp = -sort(-rand(n))
x = collect(LinRange(minimum(xp), maximum(xp), 10001))

@btime weno4_opt(x, xp, yp)  evals=1000
@btime weno4_opt_binsearch(x, xp, yp)  evals=1000


Thanks for the suggestion. In my tests, using the same length(xi)=10001, your version is actually slightly slower:

julia> @benchmark weno4_opt(xi, xx, yy) evals=10
BenchmarkTools.Trial: 2777 samples with 10 evaluations.
 Range (min … max):  135.900 ΞΌs … 777.403 ΞΌs  β”Š GC (min … max): 0.00% … 73.59%
 Time  (median):     169.912 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   179.509 ΞΌs Β±  65.292 ΞΌs  β”Š GC (mean Β± Οƒ):  4.77% Β±  9.85%

  ▁ β–‡β–‡β–ˆβ–…β–„β–‚β–β–                                                    ▁
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–†β–…β–…β–ƒβ–β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–…β–…β–…β–‡ β–ˆ
  136 ΞΌs        Histogram: log(frequency) by time        635 ΞΌs <

 Memory estimate: 78.27 KiB, allocs estimate: 2.
julia> @benchmark weno4_M(xi, xx, yy) evals=10
BenchmarkTools.Trial: 2479 samples with 10 evaluations.
 Range (min … max):  160.647 ΞΌs … 856.511 ΞΌs  β”Š GC (min … max): 0.00% … 78.87%
 Time  (median):     191.992 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   201.213 ΞΌs Β±  65.491 ΞΌs  β”Š GC (mean Β± Οƒ):  4.24% Β±  9.47%

  ▁ β–ˆβ–‡β–ˆβ–…β–„β–‚β–                                                     ▁
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–…β–…β–β–ƒβ–…β–ƒβ–β–β–β–ƒβ–β–β–β–β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–…β–…β–‡ β–ˆ
  161 ΞΌs        Histogram: log(frequency) by time        654 ΞΌs <

 Memory estimate: 78.27 KiB, allocs estimate: 2.

That’s what I expected to happen. Hence my disbelief.

What I have thought about is on various iterations it does various combinations of things like

 xp[i] -  xp[i - 1]
 xp[i + 1] - xp[i]

so when i = 5 that’s

 xp[5] -  xp[4] # xi - xim
 xp[6] - xp[5]  # xip - xi

and when i = 6 that becomes

 xp[6] -  xp[5] # xi - xim
 xp[7] - xp[6] # xip - xi

So perhaps we can trade space for time and pre calculate all of the xp[i] - xp[i - 1] using vector math

Anyway, that was something to post after I had tried it :slight_smile:

Thanks for the suggestion. The binary search seems a much more solid choice for the general case.

My plan is to submit a PR to Interpolations.jl, which I think is more useful to the community than a standalone package. This effort was to get a working version that I could validate against the python version. Under Interpolations.jl, the idea is to pre-calculate the coefficients for all intervals between the nodes and then build an interpolant function that will apply the formula to any value to be interpolated. Here WENO will be similar in application to the already supported monotonic interpolation (which, by the way, in Interpolations.jl makes use of searchsortedfirst).

I am still getting started in Julia, so will probably need some help to get a clean version that can be used in Interpolations.jl.

3 Likes