Can you post the latest version of the code? With the improved searchsortedlast
?
I guess many people (including me) want to give a shot improving that even more
Can you post the latest version of the code? With the improved searchsortedlast
?
I guess many people (including me) want to give a shot improving that even more
Thanks for the offer! Here is the latest version:
function weno4_opt(xs, xp, fp)
Ngrid = size(xp)[1]
Ξ΅ = 1e-6
fs = similar(xs)
left = fp[1]
right = fp[end]
prevΞ²idx = -1
Ξ²2 = 0.0
Ξ²3 = 0.0
@inbounds for (idx, x) in enumerate(xs)
if x < xp[1]
fs[idx] = left
continue
elseif x > xp[end]
fs[idx] = right
continue
else
i = 2
while @inbounds x > xp[i]
i += 1
end
i -= 1
end
if i == Ngrid
i -= 1
end
if i == 1
xi = xp[i]
xip = xp[i+1]
xipp = xp[i+2]
hi = xip - xi
hip = xipp - xip
yi = fp[i]
yip = fp[i+1]
yipp = fp[i+2]
q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
fs[idx] = q3
continue
elseif i == Ngrid - 1
xim = xp[i-1]
xi = xp[i]
xip = xp[i+1]
him = xi - xim
hi = xip - xi
yim = fp[i-1]
yi = fp[i]
yip = fp[i+1]
q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
fs[idx] = q2
continue
end
xim = xp[i-1]
xi = xp[i]
xip = xp[i+1]
xipp = xp[i+2]
him = xi - xim
hi = xip - xi
hip = xipp - xip
yim = fp[i-1]
yi = fp[i]
yip = fp[i+1]
yipp = fp[i+2]
q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
if i != prevΞ²idx
H = him + hi + hip
yyim = - ((2*him + hi)*H + him*(him + hi)) / (him*(him + hi)*H) * yim
yyim += ((him + hi)*H) / (him*hi*(hi + hip)) * yi
yyim -= (him*H) / ((him + hi)*hi*hip) * yip
yyim += (him*(him + hi)) / ((hi + hip)*hip*H) * yipp
yyi = - (hi*(hi + hip)) / (him*(him + hi)*H) * yim
yyi += (hi*(hi + hip) - him*(2*hi + hip)) / (him*hi*(hi + hip)) * yi
yyi += (him*(hi + hip)) / ((him + hi)*hi*hip) * yip
yyi -= (him*hi) / ((hi + hip)*hip*H) * yipp
yyip = (hi*hip) / (him*(him + hi)*H) * yim
yyip -= (hip*(him + hi)) / (him*hi*(hi + hip)) * yi
yyip += ((him + 2*hi)*hip - (him + hi)*hi) / ((him + hi)*hi*hip) * yip
yyip += ((him + hi)*hi) / ((hi + hip)*hip*H) * yipp
yyipp = - ((hi + hip)*hip) / (him*(him + hi)*H) * yim
yyipp += (hip*H) / (him*hi*(hi + hip)) * yi
yyipp -= ((hi + hip) * H) / ((him + hi)*hi*hip) * yip
yyipp += ((2*hip + hi)*H + hip*(hi + hip)) / ((hi + hip)*hip*H) * yipp
Ξ²2 = (hi + hip)^2 * (abs(yyip - yyi) / hi - abs(yyi - yyim) / him)^2
Ξ²3 = (him + hi)^2 * (abs(yyipp - yyip) / hip - abs(yyip - yyi) / hi)^2
prevΞ²idx = i
end
Ξ³2 = - (x - xipp) / (xipp - xim)
Ξ³3 = (x - xim) / (xipp - xim)
Ξ±2 = Ξ³2 / (Ξ΅ + Ξ²2)
Ξ±3 = Ξ³3 / (Ξ΅ + Ξ²3)
Ο2 = Ξ±2 / (Ξ±2 + Ξ±3)
Ο3 = Ξ±3 / (Ξ±2 + Ξ±3)
fs[idx] = Ο2*q2 + Ο3*q3
end
return fs
end
now I wonder if we should improve this directlyβ¦
Both the numpy and the Julia algorithms use bissections, and that in general should be faster than the simple search Iβve written. The numpy version has some branches which may deal with this specific data better. There is some investigation to be done there. Still a minimal example confirming in which cases one is faster than the other would be needed, maybe what happens here is not a regular pattern.
This is actually quite typical. Yes, Base functions are well-written and quite optimized, but they are optimized for the general case. If you have an idea for how to address your specific problem, it is likely that you can improve on Base. This is a feature of the language.
I think the other point here was that since the index i
found is always increasing in the enumeration of xs
, you donβt need to always start searching from i=2
. I think you can simply move the statement out of the loop.
The custom version of searchsortedlast
is basically findlast
with a predicate function. Perhaps you just picked the wrong search function as each is optimized for different use cases. For the size of the arrays in question here, the midpoint bisection search algorithm is not advantageous.
And which would be better?
It depends on the size of the array being searched. How many interpolation knots do we expect?
Sorry for not following: I understand there are at least linear search, binary search and interpolation search. Which algorithm are you proposing?
Iβm not proposing anything new. Op wanted to know why searchsortedlast
is not optimal in this case, and Iβm suggesting the linear search in findlast
might have been better, which is analogous to the custom loop here. Those are both functions in Base
.
Reflecting on the thread as a whole, itβs clear that Python has created a perception that you have to use standard library code to be optimal and that rolling your own for loop will be slow. Here we see there is nothing special about Base or the standard library. Itβs just Julia whether you write it yourself or use an existing function.
Thank you for the code.
Before this discussion I was not aware of WENO interpolation methods. However, it seems, that they will be very useful in my studies. I hope you will write the corresponding package.
P.S. Does anyone aware of existing weno interpolation implementations in Julia. I only managed to find weno5
function from Kinetic.jl.
Another point here is that numba is jit-compiled using llvm, just like Julia. For straightforward code like this, one shouldnβt necessarily expect Julia to be faster.
I broke the code out into functions because I couldnβt see the control flow and something wonderful happend - runtime was 62% of the weno4_opt
version
julia> # show the results are identical because I still don't believe it :)
julia> weno4_opt(xi, xx, yy) - weno4_M(xi, xx, yy) |> unique
1-element Vector{Float64}:
0.0
julia> @benchmark weno4_M(xi, xx, yy)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 293.163 ΞΌs β¦ 2.371 ms β GC (min β¦ max): 0.00% β¦ 83.72%
Time (median): 294.908 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 298.795 ΞΌs Β± 46.086 ΞΌs β GC (mean Β± Ο): 0.59% Β± 3.13%
ββββββ
βββββ βββ β
ββββββββββββββββββββ
ββ
ββ
βββββββββββ
βββ
ββ
ββββββββββββββββββββ β
293 ΞΌs Histogram: log(frequency) by time 329 ΞΌs <
Memory estimate: 78.23 KiB, allocs estimate: 2.
julia> @benchmark weno4_opt(xi, xx, yy)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 478.019 ΞΌs β¦ 1.513 ms β GC (min β¦ max): 0.00% β¦ 64.02%
Time (median): 478.699 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 482.409 ΞΌs Β± 41.598 ΞΌs β GC (mean Β± Ο): 0.34% Β± 2.71%
βββ ββββ β β β
ββββ
ββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββ β
478 ΞΌs Histogram: log(frequency) by time 513 ΞΌs <
Memory estimate: 78.23 KiB, allocs estimate: 2.
function iIsOne(x, xp, fp)
+(
fp[1] * ((x - xp[2]) * (x - xp[3])) / ((xp[2] - xp[1]) * (xp[3] - xp[1])),
-fp[2] * ((x - xp[1]) * (x - xp[3])) / ((xp[2] - xp[1]) * (xp[3] - xp[2])),
fp[3] * ((x - xp[1]) * (x - xp[2])) / ((xp[3] - xp[1]) * (xp[3] - xp[2]))
)
end
function iIsNgridMinusOne(i, x, xp, fp)
+(
fp[i-1] * ((x - xp[i]) * (x - xp[i+1])) / ((xp[i] - xp[i-1]) * (xp[i+1] - xp[i-1])),
-fp[i] * ((x - xp[i-1]) * (x - xp[i+1])) / ((xp[i] - xp[i-1]) * (xp[i+1] - xp[i] )),
fp[i+1] * ((x - xp[i-1]) * (x - xp[i] )) / ((xp[i+1] - xp[i-1]) * (xp[i+1] - xp[i] ))
)
end
function iIsNotPrevΞ²idx(i, xp, fp)
xim = xp[i - 1]
xi = xp[i]
xip = xp[i + 1]
xipp = xp[i + 2]
yim = fp[i-1]
yi = fp[i]
yip = fp[i+1]
yipp = fp[i+2]
yyim = - ((2 * (xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip)) + (xi - xim) * ((xi - xim) + (xip - xi))) / ((xi - xim) * ((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yim
yyim += (((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) / ((xi - xim) * (xip - xi) * ((xip - xi) + (xipp - xip))) * yi
yyim -= ((xi - xim) * ((xi - xim) + (xip - xi) + (xipp - xip))) / (((xi - xim) + (xip - xi)) * (xip - xi) * (xipp - xip)) * yip
yyim += ((xi - xim) * ((xi - xim) + (xip - xi))) / (((xip - xi) + (xipp - xip)) * (xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yipp
yyi = - ((xip - xi) * ((xip - xi) + (xipp - xip))) / ((xi - xim) * ((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yim
yyi += ((xip - xi) * ((xip - xi) + (xipp - xip)) - (xi - xim) * (2 * (xip - xi) + (xipp - xip))) / ((xi - xim) * (xip - xi) * ((xip - xi) + (xipp - xip))) * yi
yyi += ((xi - xim) * ((xip - xi) + (xipp - xip))) / (((xi - xim) + (xip - xi)) * (xip - xi) * (xipp - xip)) * yip
yyi -= ((xi - xim) * (xip - xi)) / (((xip - xi) + (xipp - xip)) * (xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yipp
yyip = ((xip - xi) * (xipp - xip)) / ((xi - xim) * ((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yim
yyip -= ((xipp - xip) * ((xi - xim) + (xip - xi))) / ((xi - xim) * (xip - xi) * ((xip - xi) + (xipp - xip))) * yi
yyip += (((xi - xim) + 2 * (xip - xi)) * (xipp - xip) - ((xi - xim) + (xip - xi)) * (xip - xi)) / (((xi - xim) + (xip - xi)) * (xip - xi) * (xipp - xip)) * yip
yyip += (((xi - xim) + (xip - xi)) * (xip - xi)) / (((xip - xi) + (xipp - xip)) * (xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yipp
yyipp = - (((xip - xi) + (xipp - xip)) * (xipp - xip)) / ((xi - xim) * ((xi - xim) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yim
yyipp += ((xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) / ((xi - xim) * (xip - xi) * ((xip - xi) + (xipp - xip))) * yi
yyipp -= (((xip - xi) + (xipp - xip)) * ((xi - xim) + (xip - xi) + (xipp - xip))) / (((xi - xim) + (xip - xi)) * (xip - xi) * (xipp - xip)) * yip
yyipp += ((2 * (xipp - xip) + (xip - xi)) * ((xi - xim) + (xip - xi) + (xipp - xip)) + (xipp - xip) * ((xip - xi) + (xipp - xip))) / (((xip - xi) + (xipp - xip)) * (xipp - xip) * ((xi - xim) + (xip - xi) + (xipp - xip))) * yipp
Ξ²2 = ((xip - xi) + (xipp - xip))^2 * (abs(yyip - yyi) / (xip - xi) - abs(yyi - yyim) / (xi - xim))^2
Ξ²3 = ((xi - xim) + (xip - xi))^2 * (abs(yyipp - yyip) / (xipp - xip) - abs(yyip - yyi) / (xip - xi))^2
Ξ²2, Ξ²3
end
function q2q3(i, x, xp, fp)
xim = xp[i-1]
xi = xp[i]
xip = xp[i+1]
xipp = xp[i+2]
him = xi - xim
hi = xip - xi
hip = xipp - xip
yim = fp[i-1]
yi = fp[i]
yip = fp[i+1]
yipp = fp[i+2]
q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
q2, q3
end
function Ο2Ο3(i, x, xp, Ξ²2, Ξ²3)
Ξ΅ = 1e-6
Ξ³2 = - (x - xp[i+2]) / (xp[i+2] - xp[i-1])
Ξ³3 = (x - xp[i-1]) / (xp[i+2] - xp[i-1])
Ξ±2 = Ξ³2 / (Ξ΅ + Ξ²2)
Ξ±3 = Ξ³3 / (Ξ΅ + Ξ²3)
Ξ±2 / (Ξ±2 + Ξ±3), Ξ±3 / (Ξ±2 + Ξ±3)
end
function weno4_M(xs, xp, fp)
Ngrid = size(xp)[1]
fs = similar(xs)
left = fp[1]
right = fp[end]
prevΞ²idx = -1
Ξ²2 = 0.0
Ξ²3 = 0.0
@inbounds for (idx, x) in enumerate(xs)
if x < xp[1]
fs[idx] = left
continue
elseif x > xp[end]
fs[idx] = right
continue
else
i = 2
while @inbounds x > xp[i]
i += 1
end
i -= 1
end
if i == Ngrid
i -= 1
end
if i == 1
fs[idx] = iIsOne(x, xp, fp)
continue
elseif i == Ngrid - 1
fs[idx] = iIsNgridMinusOne(i, x, xp, fp)
continue
end
q2, q3 = q2q3(i, x, xp, fp)
if i != prevΞ²idx
Ξ²2, Ξ²3 = iIsNotPrevΞ²idx(i, xp, fp)
prevΞ²idx = i
end
Ο2, Ο3 = Ο2Ο3(i, x, xp, Ξ²2, Ξ²3)
fs[idx] = Ο2 * q2 + Ο3 * q3
end
return fs
end
A significant fraction of the runtime of searchsortedlast
is spent performing isless
comparisons. Note that isless
has some extra machinery to handle NaN since weβre using floating-point arguments.
I see float performance improved significantly by using <
instead of the default isless
. THIS SUBSTITUTION WILL PRODUCE INCORRECT RESULTS IF NaN APPEARS IN ANY OF YOUR INPUTS.
using BenchmarkTools
v = cumsum(rand(1000)); v ./= last(v);
@benchmark searchsortedlast($v,z) setup=(z=rand())
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
Range (min β¦ max): 13.627 ns β¦ 26.453 ns β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 16.032 ns β GC (median): 0.00%
Time (mean Β± Ο): 16.049 ns Β± 0.351 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
β β β β β
ββ β β ββ β β
ββββ
βββββββββββββββββ
ββββββ
ββββββββββββββββββββββββββ
ββββββ β
13.6 ns Histogram: log(frequency) by time 17 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark searchsortedlast($v,z;lt=<) setup=(z=rand())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min β¦ max): 7.908 ns β¦ 24.525 ns β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 10.711 ns β GC (median): 0.00%
Time (mean Β± Ο): 10.643 ns Β± 0.888 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
β β
β
β β β β
β
βββββββββββββββββββββββββββββ
βββββββββ
ββ
βββββββββββββββββββ β
7.91 ns Histogram: frequency by time 12.2 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
See Fastest data structure for a priority queue - General Usage / Performance - JuliaLang
Interestingly, the break-even for binary search appears close to the MWEβs vector length 20 for xp and yp:
n = 20
271.237 ΞΌs (2 allocations: 78.23 KiB)
274.209 ΞΌs (2 allocations: 78.23 KiB)
n = 200
837.772 ΞΌs (2 allocations: 78.23 KiB)
356.019 ΞΌs (2 allocations: 78.23 KiB)
n = 2000
4.551 ms (2 allocations: 78.23 KiB)
523.962 ΞΌs (2 allocations: 78.23 KiB)
function weno4_opt(xs, xp, fp)
Ngrid = size(xp)[1]
Ξ΅ = 1e-6
fs = similar(xs)
left = fp[1]
right = fp[end]
prevΞ²idx = -1
Ξ²2 = 0.0
Ξ²3 = 0.0
@inbounds for (idx, x) in enumerate(xs)
if x < xp[1]
fs[idx] = left
continue
elseif x > xp[end]
fs[idx] = right
continue
else
i = 2
while @inbounds x > xp[i]
i += 1
end
i -= 1
end
if i == Ngrid
i -= 1
end
if i == 1
xi = xp[i]
xip = xp[i+1]
xipp = xp[i+2]
hi = xip - xi
hip = xipp - xip
yi = fp[i]
yip = fp[i+1]
yipp = fp[i+2]
q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
fs[idx] = q3
continue
elseif i == Ngrid - 1
xim = xp[i-1]
xi = xp[i]
xip = xp[i+1]
him = xi - xim
hi = xip - xi
yim = fp[i-1]
yi = fp[i]
yip = fp[i+1]
q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
fs[idx] = q2
continue
end
xim = xp[i-1]
xi = xp[i]
xip = xp[i+1]
xipp = xp[i+2]
him = xi - xim
hi = xip - xi
hip = xipp - xip
yim = fp[i-1]
yi = fp[i]
yip = fp[i+1]
yipp = fp[i+2]
q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
if i != prevΞ²idx
H = him + hi + hip
yyim = - ((2*him + hi)*H + him*(him + hi)) / (him*(him + hi)*H) * yim
yyim += ((him + hi)*H) / (him*hi*(hi + hip)) * yi
yyim -= (him*H) / ((him + hi)*hi*hip) * yip
yyim += (him*(him + hi)) / ((hi + hip)*hip*H) * yipp
yyi = - (hi*(hi + hip)) / (him*(him + hi)*H) * yim
yyi += (hi*(hi + hip) - him*(2*hi + hip)) / (him*hi*(hi + hip)) * yi
yyi += (him*(hi + hip)) / ((him + hi)*hi*hip) * yip
yyi -= (him*hi) / ((hi + hip)*hip*H) * yipp
yyip = (hi*hip) / (him*(him + hi)*H) * yim
yyip -= (hip*(him + hi)) / (him*hi*(hi + hip)) * yi
yyip += ((him + 2*hi)*hip - (him + hi)*hi) / ((him + hi)*hi*hip) * yip
yyip += ((him + hi)*hi) / ((hi + hip)*hip*H) * yipp
yyipp = - ((hi + hip)*hip) / (him*(him + hi)*H) * yim
yyipp += (hip*H) / (him*hi*(hi + hip)) * yi
yyipp -= ((hi + hip) * H) / ((him + hi)*hi*hip) * yip
yyipp += ((2*hip + hi)*H + hip*(hi + hip)) / ((hi + hip)*hip*H) * yipp
Ξ²2 = (hi + hip)^2 * (abs(yyip - yyi) / hi - abs(yyi - yyim) / him)^2
Ξ²3 = (him + hi)^2 * (abs(yyipp - yyip) / hip - abs(yyip - yyi) / hi)^2
prevΞ²idx = i
end
Ξ³2 = - (x - xipp) / (xipp - xim)
Ξ³3 = (x - xim) / (xipp - xim)
Ξ±2 = Ξ³2 / (Ξ΅ + Ξ²2)
Ξ±3 = Ξ³3 / (Ξ΅ + Ξ²3)
Ο2 = Ξ±2 / (Ξ±2 + Ξ±3)
Ο3 = Ξ±3 / (Ξ±2 + Ξ±3)
fs[idx] = Ο2*q2 + Ο3*q3
end
return fs
end
function weno4_opt_binsearch(xs, xp, fp)
Ngrid = size(xp)[1]
Ξ΅ = 1e-6
fs = similar(xs)
left = fp[1]
right = fp[end]
prevΞ²idx = -1
Ξ²2 = 0.0
Ξ²3 = 0.0
@inbounds for (idx, x) in enumerate(xs)
if x < xp[1]
fs[idx] = left
continue
elseif x > xp[end]
fs[idx] = right
continue
else
### linear search
#i = 2
#while @inbounds x > xp[i]
# i += 1
#end
#i -= 1
### binary search
ileft = 2
i = Ngrid - 1
while ileft <= i
mid = (ileft + i) >> 1
if @inbounds x < xp[mid]
i = mid - 1
else
ileft = mid + 1
end
end
end
if i == Ngrid
i -= 1
end
if i == 1
xi = xp[i]
xip = xp[i+1]
xipp = xp[i+2]
hi = xip - xi
hip = xipp - xip
yi = fp[i]
yip = fp[i+1]
yipp = fp[i+2]
q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
fs[idx] = q3
continue
elseif i == Ngrid - 1
xim = xp[i-1]
xi = xp[i]
xip = xp[i+1]
him = xi - xim
hi = xip - xi
yim = fp[i-1]
yi = fp[i]
yip = fp[i+1]
q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
fs[idx] = q2
continue
end
xim = xp[i-1]
xi = xp[i]
xip = xp[i+1]
xipp = xp[i+2]
him = xi - xim
hi = xip - xi
hip = xipp - xip
yim = fp[i-1]
yi = fp[i]
yip = fp[i+1]
yipp = fp[i+2]
q2 = yim * ((x - xi) * (x - xip)) / (him * (him + hi))
q2 -= yi * ((x - xim) * (x - xip)) / (him * hi)
q2 += yip * ((x - xim) * (x - xi)) / ((him + hi) * hi)
q3 = yi * ((x - xip) * (x - xipp)) / (hi * (hi + hip))
q3 -= yip * ((x - xi) * (x - xipp)) / (hi * hip)
q3 += yipp * ((x - xi) * (x - xip)) / ((hi + hip) * hip)
if i != prevΞ²idx
H = him + hi + hip
yyim = - ((2*him + hi)*H + him*(him + hi)) / (him*(him + hi)*H) * yim
yyim += ((him + hi)*H) / (him*hi*(hi + hip)) * yi
yyim -= (him*H) / ((him + hi)*hi*hip) * yip
yyim += (him*(him + hi)) / ((hi + hip)*hip*H) * yipp
yyi = - (hi*(hi + hip)) / (him*(him + hi)*H) * yim
yyi += (hi*(hi + hip) - him*(2*hi + hip)) / (him*hi*(hi + hip)) * yi
yyi += (him*(hi + hip)) / ((him + hi)*hi*hip) * yip
yyi -= (him*hi) / ((hi + hip)*hip*H) * yipp
yyip = (hi*hip) / (him*(him + hi)*H) * yim
yyip -= (hip*(him + hi)) / (him*hi*(hi + hip)) * yi
yyip += ((him + 2*hi)*hip - (him + hi)*hi) / ((him + hi)*hi*hip) * yip
yyip += ((him + hi)*hi) / ((hi + hip)*hip*H) * yipp
yyipp = - ((hi + hip)*hip) / (him*(him + hi)*H) * yim
yyipp += (hip*H) / (him*hi*(hi + hip)) * yi
yyipp -= ((hi + hip) * H) / ((him + hi)*hi*hip) * yip
yyipp += ((2*hip + hi)*H + hip*(hi + hip)) / ((hi + hip)*hip*H) * yipp
Ξ²2 = (hi + hip)^2 * (abs(yyip - yyi) / hi - abs(yyi - yyim) / him)^2
Ξ²3 = (him + hi)^2 * (abs(yyipp - yyip) / hip - abs(yyip - yyi) / hi)^2
prevΞ²idx = i
end
Ξ³2 = - (x - xipp) / (xipp - xim)
Ξ³3 = (x - xim) / (xipp - xim)
Ξ±2 = Ξ³2 / (Ξ΅ + Ξ²2)
Ξ±3 = Ξ³3 / (Ξ΅ + Ξ²3)
Ο2 = Ξ±2 / (Ξ±2 + Ξ±3)
Ο3 = Ξ±3 / (Ξ±2 + Ξ±3)
fs[idx] = Ο2*q2 + Ο3*q3
end
return fs
end
using BenchmarkTools
n = 20
xp = sort(randn(n)) # mimicking the OP's data
yp = -sort(-rand(n))
x = collect(LinRange(minimum(xp), maximum(xp), 10001))
@btime weno4_opt(x, xp, yp) evals=1000
@btime weno4_opt_binsearch(x, xp, yp) evals=1000
Thanks for the suggestion. In my tests, using the same length(xi)=10001
, your version is actually slightly slower:
julia> @benchmark weno4_opt(xi, xx, yy) evals=10
BenchmarkTools.Trial: 2777 samples with 10 evaluations.
Range (min β¦ max): 135.900 ΞΌs β¦ 777.403 ΞΌs β GC (min β¦ max): 0.00% β¦ 73.59%
Time (median): 169.912 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 179.509 ΞΌs Β± 65.292 ΞΌs β GC (mean Β± Ο): 4.77% Β± 9.85%
β ββββ
ββββ β
βββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββ
β
β
β β
136 ΞΌs Histogram: log(frequency) by time 635 ΞΌs <
Memory estimate: 78.27 KiB, allocs estimate: 2.
julia> @benchmark weno4_M(xi, xx, yy) evals=10
BenchmarkTools.Trial: 2479 samples with 10 evaluations.
Range (min β¦ max): 160.647 ΞΌs β¦ 856.511 ΞΌs β GC (min β¦ max): 0.00% β¦ 78.87%
Time (median): 191.992 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 201.213 ΞΌs Β± 65.491 ΞΌs β GC (mean Β± Ο): 4.24% Β± 9.47%
β ββββ
βββ β
βββββββββββββ
β
βββ
ββββββββββββββββββββββββββββββββββββββββββ
β
β β
161 ΞΌs Histogram: log(frequency) by time 654 ΞΌs <
Memory estimate: 78.27 KiB, allocs estimate: 2.
Thatβs what I expected to happen. Hence my disbelief.
What I have thought about is on various iterations it does various combinations of things like
xp[i] - xp[i - 1]
xp[i + 1] - xp[i]
so when i = 5 thatβs
xp[5] - xp[4] # xi - xim
xp[6] - xp[5] # xip - xi
and when i = 6 that becomes
xp[6] - xp[5] # xi - xim
xp[7] - xp[6] # xip - xi
So perhaps we can trade space for time and pre calculate all of the xp[i] - xp[i - 1]
using vector math
Anyway, that was something to post after I had tried it
Thanks for the suggestion. The binary search seems a much more solid choice for the general case.
My plan is to submit a PR to Interpolations.jl, which I think is more useful to the community than a standalone package. This effort was to get a working version that I could validate against the python version. Under Interpolations.jl, the idea is to pre-calculate the coefficients for all intervals between the nodes and then build an interpolant function that will apply the formula to any value to be interpolated. Here WENO will be similar in application to the already supported monotonic interpolation (which, by the way, in Interpolations.jl makes use of searchsortedfirst
).
I am still getting started in Julia, so will probably need some help to get a clean version that can be used in Interpolations.jl.