Hi, I have performance problem regarding numerical integration and interpolation. I want to perform numerical integration with interpolation like below.

using Dierckx
using QuadGK
function test()
x = 0:0.01:1000
y = sin.(x)
y_int = Spline1D(x, y)
return quadgk(y_int, 0, 10*2*pi)[1]
end
@time test()

in the above case, we have the analytic representation of y, but in my actual case, y is the observed data, so I want to use interpolation. The problem is this calculation takes lots of time: 14.963719 seconds (30 allocations: 39.228 MiB)

In the python code,

import numpy as np
from scipy import interpolate
from scipy.integrate import quad
import time
def test():
x = np.linspace(0, 1000, 100000)
y = np.sin(x)
y_int = interpolate.interp1d(x, y, "cubic")
val = quad(y_int, 0, 10*2*np.pi)
return val
t1 = time.time()
y = test()
print("elapsed time", time.time() - t1)

this is much faster than the original Julia code; elapsed time 0.01470804214477539

What is wrong with my Julia code, and how can I fix it?

A common difference between Python and Julia is the way how one usually times functions. In Julia, the first execution of a function is sometimes much slower (due to the JIT-compilation). If you call the same function again (in the same session), then it will be much faster afterwards.

Thank you for your comment. Yeah, I know that point, but actually, the function does not become faster even the second time.

using Dierckx
using QuadGK
using BenchmarkTools
function test()
x = 0:0.01:1000
y = sin.(x)
y_int = Spline1D(x, y)
quadgk(y_int, 0, 10*2*pi)
end
@btime test()

with the result of 14.671 s (30 allocations: 39.23 MiB)

In this case the problem is that quadgk and scipy’s quad use different default tolerances.

Following the documentation, quadgk uses an absolute tolerance atol = 0 and relative tolerance rtol = sqrt(eps), while scipy’s quad uses the equivalent of atol = sqrt(eps) and rtol = sqrt(eps).

The default choice of quadgk is not the best in your case, as the analytical value of your integral is exactly zero. This is actually mentioned in the quadgk docs:

(Note that it is useful to specify a positive atol in cases where norm(I) may be zero.)

Choosing atol = sqrt(eps(Float64)) should bring the evaluation time close to the one you see in python.

Since you’re fitting a spline, you might as well integrate the spline?

julia> using Dierckx
julia> function test()
x = 0:0.01:1000
y = sin.(x)
y_int = Spline1D(x, y)
return Dierckx.integrate(y_int, 0, 10*2*pi)
end
test (generic function with 1 method)
julia> @time test()
0.025672 seconds (17 allocations: 20.219 MiB)
1.8671935074722466e-14

The tolerance business is the direct cause, but this is not the way to integrate sampled data. Quadgk assumes it has access to the real underlying function, which it hasn’t.

Instead, you should use standard methods, like trapezoidal, Simpson’s or Romberg integration.