How to speed up the numerical integration with interpolation

J_I · March 17, 2023, 5:42am

Hi, I have performance problem regarding numerical integration and interpolation. I want to perform numerical integration with interpolation like below.

using Dierckx
using QuadGK

function test()
    x = 0:0.01:1000
    y = sin.(x) 
    y_int = Spline1D(x, y)
    return quadgk(y_int, 0, 10*2*pi)[1]
end

@time test()

in the above case, we have the analytic representation of y, but in my actual case, y is the observed data, so I want to use interpolation. The problem is this calculation takes lots of time: 14.963719 seconds (30 allocations: 39.228 MiB)

In the python code,

import numpy as np
from scipy import interpolate
from scipy.integrate import quad
import time


def test():
    x = np.linspace(0, 1000, 100000)
    y = np.sin(x)   
    y_int = interpolate.interp1d(x, y, "cubic")
    val = quad(y_int, 0, 10*2*np.pi)
    return val

t1 = time.time()
y = test()
print("elapsed time", time.time() - t1)

this is much faster than the original Julia code; elapsed time 0.01470804214477539

What is wrong with my Julia code, and how can I fix it?

SteffenPL · March 17, 2023, 6:04am

A common difference between Python and Julia is the way how one usually times functions. In Julia, the first execution of a function is sometimes much slower (due to the JIT-compilation). If you call the same function again (in the same session), then it will be much faster afterwards.

A common tool to benchmark is therefore GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language.

You concrete example looks actually fine, I would guess that if you call @time test() twice, then the function might be much faster.

J_I · March 17, 2023, 6:25am

Thank you for your comment. Yeah, I know that point, but actually, the function does not become faster even the second time.

using Dierckx
using QuadGK
using BenchmarkTools

function test()
    x = 0:0.01:1000
    y = sin.(x) 
    y_int = Spline1D(x, y)
    quadgk(y_int, 0, 10*2*pi)
end

@btime test()

with the result of 14.671 s (30 allocations: 39.23 MiB)

jipolanco · March 17, 2023, 6:34am

In this case the problem is that quadgk and scipy’s quad use different default tolerances.

Following the documentation, quadgk uses an absolute tolerance atol = 0 and relative tolerance rtol = sqrt(eps), while scipy’s quad uses the equivalent of atol = sqrt(eps) and rtol = sqrt(eps).

The default choice of quadgk is not the best in your case, as the analytical value of your integral is exactly zero. This is actually mentioned in the quadgk docs:

(Note that it is useful to specify a positive atol in cases where norm(I) may be zero.)

Choosing atol = sqrt(eps(Float64)) should bring the evaluation time close to the one you see in python.

jishnub · March 17, 2023, 6:42am

Since you’re fitting a spline, you might as well integrate the spline?

julia> using Dierckx

julia> function test()
           x = 0:0.01:1000
           y = sin.(x) 
           y_int = Spline1D(x, y)
           return Dierckx.integrate(y_int, 0, 10*2*pi)
       end
test (generic function with 1 method)

julia> @time test()
  0.025672 seconds (17 allocations: 20.219 MiB)
1.8671935074722466e-14

J_I · March 17, 2023, 7:33am

Thank you very much, and I confirmed that your code efficiently works in my actual case!
Thanks again!

J_I · March 17, 2023, 7:37am

Thank you for your suggestion. I overlooked the mention in the document. Your suggestion works in my case! Thanks again.

DNF · March 17, 2023, 7:40am

The tolerance business is the direct cause, but this is not the way to integrate sampled data. Quadgk assumes it has access to the real underlying function, which it hasn’t.

Instead, you should use standard methods, like trapezoidal, Simpson’s or Romberg integration.