How do I do a fast cumulative integration

What are my options for a fast cumulative integration? I can use one of many quadrature packages but this does not scale.

using QuadGK
f = cos
x₁ = 0.0
int_f(x₂) = quadgk(f, x₁, x₂)[1]

using Plots
plot([f, int_f], xlims=(x₁,2π))

Can you suggest some packages? Maybe I am missing something obvious.

It looks like the NumericalIntegration.jl package has a cumul_integrate function.

I saw that package. It only does a simple cumulative sum using trapezoidal rule.

Is there a reason why most integration packages don’t provide an easy interface for a cumulative sum?

Maybe something like

using QuadGK, PyPlot
cumulative_integrate(f, xs) =
    cumsum(quadgk(f, x1, x2)[1] for (x1, x2) in zip(xs[1:end-1], xs[2:end]))
xs = 0:1e-3:2pi
plot(xs[2:end], cumulative_integrate(cos, xs))

(I have no idea if this is faster/slower than other proposals.)

ApproxFun does this:

julia> f = Fun(x->exp(-x^2),0..5);

julia> g = 2/sqrt(π)*integrate(f); h = g - g(0);

julia> x = 0:0.5:5; [f.(x) h.(x)]
11×2 Matrix{Float64}:
 1.0         0.0
 0.778801    0.5205
 0.367879    0.842701
 0.105399    0.966105
 0.0183156   0.995322
 0.00193045  0.999593
 0.00012341  0.999978
 4.78512e-6  0.999999
 1.12535e-7  1.0
 1.60523e-9  1.0
 1.3888e-11  1.0

For a univariate function, I doubt you will be able to do much better.


The basic reason is that numerical integration algorithms are formulated to produce a single number, the definite integral over the entire interval. For a “cumulative sum” you want an algorithm that produces a function, and such algorithms generally look very different.

Of course, it’s easy to get a cumsum from something like a trapezoidal rule or similar low-order integration methods that just sum a bunch of local integral estimates over small intervals, typically from equally-spaced samples. But more sophisticated integration techniques like Gaussian quadrature are typically exponentially more accurate than the trapezoidal rule, so you are sacrificing a lot if you use the simpler local methods (ala NumericalIntegration.jl or Trapz.jl).

The more sophisticated integration methods typically work by implicitly constructing one or more high-order polynomial approximations of your function (carefully sampled at non-equispaced points) and then integrating that polynomial, but the methods are cleverly formulated to avoid explicitly constructing the polynomial(s). They instead just sample the function at carefully chosen points, multiply by some precomputed weights (that correspond mathematically to integrating an interpolating polynomial), and sum to get the estimated integral (over the whole interval only).

To instead get the “cumsum” accurately, i.e. to get the integrated function at arbitrary points in the interval, you need a very different type of formulation that explicitly constructs the interpolating polynomial(s) in some form, at which point you can integrate the polynomial to obtain another polynomial and evaluate it wherever you want. This is exactly what ApproxFun.jl does.


Depending on the use case, you could try quick & Dierckx (or better, BSplineKit.jl).

The code below seems to run about 5x faster than the ApproxFun example above, for similar computed precision:

using Dierckx
ef(x) = exp(-x^2)
xfine = 0.0:0.05:5.0
spl = Spline1D(xfine, ef.(xfine))
x = 0.0:0.5:5.0
hx = 2/sqrt(π) * Dierckx.integrate.((spl,), (0.,), x)
[ef.(x) hx]

 1.0          0.0
 0.778801     0.5205
 0.367879     0.842701
 0.105399     0.966105
 0.0183156    0.995322
 0.00193045   0.999593
 0.00012341   0.999978
 4.78512e-6   0.999999
 1.12535e-7   1.0
 1.60523e-9   1.0
 1.38879e-11  1.0
1 Like

So changing the question slightly, If I want to integrate measured data rather than a function, probably the best approach is that taken by rafael, namely perform a spline fit and use a low order trapz or cumsum type of approach?

If you fit a spline to the data, I believe you can do the subsequent integration exactly, since splines are polynomials.