Slow cubature

I’m trying the perform a numerical integration over 4-dimensions. Here is a self-contained example:

import HCubature
using StaticArrays: SMatrix
using StaticArrays: SVector
using LinearAlgebra: Symmetric
using LinearAlgebra: issymmetric
using LogExpFunctions: logcosh

@kwdef struct QuadOpts
    rtol::Float64 = sqrt(eps())
    atol::Float64 = 0.0
    maxevals::Int = typemax(Int)
    initdiv::Int = 1

function mvnormal_expect(logf::Function, g::Function, Σ::AbstractMatrix, quad::QuadOpts = QuadOpts())
    μ = zeros(eltype(Σ), size(Σ, 1))
    return mvnormal_expect(logf, g, μ, Σ, quad)

function mvnormal_expect(logf::Function, g::Function, μ::AbstractVector, Σ::AbstractMatrix, quad::QuadOpts = QuadOpts())
    @assert size(Σ, 1) == size(Σ, 2) == length(μ)
    @assert issymmetric(Σ)

    N = length(μ)
    U = oftype(Σ, sqrt(Symmetric(Matrix(Σ))))

    t_lb = fill(-1, N)
    t_ub = fill(+1, N)

    log_normalization = N/2 * log(2π) # Includes Jacobian of z = Σ * x

    cubature_result = HCubature.hcubature(t_lb, t_ub; quad.rtol, quad.atol, quad.maxevals, quad.initdiv) do t
        t::Union{Real, AbstractVector}

        # change of variables from t ∈ (-1,1) to x ∈ (-Inf, +Inf)
        z = t ./ (1 .- t.^2)
        jacobian = prod(@. (1 + t^2) / (1 - t^2)^2)

        # log density of the multivariate normal
        log_pdf = -z' * z / 2 - log_normalization

        # Evaluate target functions
        x = U' * z + μ
        g_result = g(x)::Union{Real,AbstractArray}
        log_f_result = logf(x)::Real

        return g_result * exp(log_f_result + log_pdf) * jacobian

    return first(cubature_result)

function integrand_logf(z::AbstractVector)
    η, ζ = z[SVector(1,2)], z[SVector(3,4)]
    return 10 * (sum(logcosh, η) - sum(logcosh, ζ))

function integrand_g(z::AbstractVector)
    t = tanh.(z)
    return t * t'

Σ = SMatrix{4,4}([0.0625 0.03125 0.04340661301312787 0.022453469438932686; 0.03125 0.0625 0.02081381050160219 0.043421385231874735; 0.04340661301312787 0.02081381050160219 0.05663074530536689 0.028551532538083665; 0.022453469438932686 0.043421385231874735 0.028551532538083665 0.055918260634255944]);
@time result_h = mvnormal_expect(integrand_logf, integrand_g, Symmetric(Σ))

However this calculation is very slow. On my machine it takes about 1 minute to complete. I need to compute such an integration many times, so it would be better if it were faster.

Any suggestions on how I can speed things up? Perhaps HCubature is not the best algorithm to try here.

This is not a static array, which will slow down everything because it will make your integrand allocating. Try to make sure your integrand is non-allocating for SVector arguments t. (Also make t_lb and t_ub static arrays or tuples.)

That said, this might be a situation where a specialized quadrature scheme like Hermite–Gauss quadrature may be be a good idea, to avoid the singularity you are introducing from the coordinate transformation for an infinite domain. (Using a non-adaptive method requires some manual tuning of the number of quadrature points, however.)

It might be worthwhile to change to N-d spherical coordinates so that only one (radial) coordinate has singular endpoints.

PS. Type assertions like t::Union{Real, AbstractVector} probably accomplish nothing.


Do the singularities lead to a larger amount of function evaluations and thus slow things down?

I haven’t found any implementation of N-d spherical coordinates in Julia (only for 3d in CoordinateTransforms.jl). It seems a bit heavy to implement generically, perhaps you had a simpler idea in mind?

Yes. The convergence rate of quadrature schemes is usually limited by the largest singularity that you haven’t accounted for analytically in the method. See e.g. Hcubature_count? - #6 by stevengj

How many values of N do you actually need? Since in practice you probably only have a few cases, you could easily just implement them by hand and dispatch on the dimensionality N, which is part of the argument types, via StaticArrays. i.e. implement a function

z, jacobian = spherical_coords(t)

(Though I think it wouldn’t actually be too much work to implement this generically for arbitrary N, in a way that unrolls all the loops statically, if you are careful, even without using fancy tricks like generated functions.)

1 Like

For example:

# map t ∈ (0,1)ⁿ first to hyper-spherical coordinates (ρ,θ₁,θ₂,…,θₙ₋₁)
# with a singular coordinate transform on t[1], a scaling by π of
# t[2:end-1], and a scaling by 2π of t[end], and then transforming
# to a Cartesian vector x, returning x and the Jacobian factor.
# (Throws an error if n < 2.)
function spherical_coords(t::SVector{n}) where {n}
    piT = float(eltype(t))(pi)
    ρ = t[1] / (1 - t[1]) # map [0,1) to [0,∞)
    jacobian = inv((1 - t[1])^2) * (piT^(n-1) * 2) # from coordinate scalings
    θ = (ntuple(i -> t[i+1] * piT, Val{n-2}())..., t[n] * 2piT)
    sinθ = sin.(θ)
    sinθprod = cumprod(sinθ)
    x′ = ntuple(i -> ρ * cos(θ[i+1]) * sinθprod[i], Val{n-2}())
    x = SVector(ρ * cos(θ[1]), x′..., ρ * sinθprod[n-1])
    jacobian *= ρ^(n-1) * prod(ntuple(i -> sinθ[i]^(n-1-i), Val{n-2}()))
    return x, jacobian

(This is just copying the formulas from the Wikipedia hyperspherical coordinates article, heavily using ntuple with Val to make sure that the compiler knows all of the lengths statically and the routine is non-allocating.) I haven’t checked it for correctness, so you should test this before using.


I implemented spherical coordinates for a few dimensions I need (2, 3, 4), and now the integration is 2x faster. That’s still slow, but at least I can get something done.

Not sure if trying to optimize more is even possible.

Someone should do a PR with this over at Hyperspherical coordinates · Issue #85 · JuliaGeometry/CoordinateTransformations.jl · GitHub. If I find the time in the next few weeks I can give it a shot.

Did you eliminate allocations from your integrand by fixing μ?

Yes, thank you for noticing that too!

Do you have any particular suggestion for this? I was looking at FastGaussQuadrature.jl, but it seems to support only 1-dimensional quadrature, there is no method for cubature. Perhaps you’re suggesting I should just nest quadratures?

For cubature you can just 1d quadrature rules along each dimension (a “tensor product” of 1d rules). See e.g. Gauss quadrature in 3D with change of variables - #2 by stevengj

I’m trying the nested Hermite–Gauss 1d quadrature rules.

Any suggestion on what’s a good way to decide that the number of points I’m using is enough?

Like perhaps doing N points and then N+1, and looking at the difference of the estimate of the integral and see if it is small enough (relative to the value of the integral).

Right, keep increasing the number until the answer stops changing to your desired tolerance. Often people repeatedly double the number of points, though in theory if you have exponential convergence that may be overkill and an additive constant could suffice. In any case, I wouldn’t increase it by only {}+ 1 — you want to increase it by enough that the larger N is much more accurate, enough that the difference between the two is a good proxy for the error in the smaller-N quadrature. There is also the computational expense of the repeated evaluations to take into account, which is a good argument for increasing N geometrically (so that the total number of function evaluations is proportional to the final N).

Of course, if you know enough about your integrand to have a better sense of the convergence rate, you could also do fancier extrapolation techniques, in principle.

Not having to mess around with this sort of thing is a big reason for the popularity of adaptive/nested quadrature schemes, but I don’t know of an adaptive implementation of Gauss–Hermite. In principle, you could use QuadGK.jl to get a Kronrod rule for error estimation with Gauss–Hermite quadrature, but I haven’t checked whether Kronrod rules work for this type of quadrature. (They don’t work for Gauss–Laguerre.)

1 Like

Nope, unfortunately, Gauss–Hermite seems to be similar to Gauss–Laguerre in that a corresponding Kronrod rule does not seem to exist (at least, not with real-valued abscissae). Using the Jacobi matrix for Gauss–Hermite rules from FastGaussQuadrature.jl, I get:

julia> using QuadGK, LinearAlgebra

julia> n = 10; J = SymTridiagonal(zeros(n), sqrt.((1:n-1)/2));  # Jacobi matrix

julia> kx, kw, gw = kronrod(J, 5, sqrt(π))
ERROR: ArgumentError: real Gauss–Kronrod rule does not exist for this Jacobi matrix

whereas gauss(J, sqrt(π)) works fine and reproduces the Hermite–Gauss points and weights from FastGaussQuadrature.jl (so it’s definitely the right Jacobi matrix).

Apparently, this is already known: see Refs. 7,11 of Genz and Keister (1996). However, the same paper shows that Patterson-type error estimates can be extended to include Gauss–Hermite quadrature. Another (better?) approach to error estimation for Gauss–Hermite rules was presented in Ehrich (2002). I’m not sure if there is code available for either of these anywhere.

I don’t know if it’d help here, but the divonne routine in Cuba.jl allows passing the position of known peaks to speed up integration (but this package is a wrapper around a C library and can’t use tricks like StaticArrays, so performance may not be much better in the end)

I presume that’s the xgiven keyword argument (together with ldxgiven?). But I could not find any example of how it works?


I don’t think I ever used it from the Julia wrapper (I used this feature of divonne several years ago from C, before I knew about Julia, but I don’t think I can easily fish out my example, can’t even remember what code was it) but it should be the same as using the C API.

Turns out decreasing the requested rtol,atol even a bit, speeds up the cubature significantly!

Ok, I found the old (from 2015) code where I used xgiven, it was in Fortran, not even in C.

Consider the integral between 1 and 0 of the Gaussian function

f(x) = \exp\left(-\left(\frac{x - x_0}{\mu}\right) ^ 2\right)

The value of the integral is

\frac{\sqrt{\pi} \mu}{2} \left(\mathrm{erf}\left(\frac{1 - x_0}{\mu}\right) + \mathrm{erf}\left(\frac{x_0}{\mu}\right)\right)

Let’s compute the expected result with SpecialFunctions and the parameters (x_0 = 0.5, \mu = 0.0001):

julia> using SpecialFunctions

julia> expected(x₀, μ) = 1/2 * sqrt(pi) * μ * (erf((1 - x₀) / μ) + erf(x₀ / μ))
expected (generic function with 1 method)

julia> expected(0.5, 0.0001)

Now let’s compute the integral numerically with Cuba.jl:

julia> using Cuba

julia> f(x, x₀=0.5, μ=0.0001) = exp(-((x - x₀) / μ) ^ 2)
f (generic function with 3 methods)

julia> integrand(x, y) = y[1] = f(x[1])
integrand (generic function with 1 method)

julia> divonne(integrand)
 1: 8.862365018700048e-5 ± 1.579627496207042e-10 (prob.: 0.0)
Integrand evaluations: 5257
Number of subregions:  40
Note: The desired accuracy was reached

julia> divonne(integrand; ngiven=1, ldxgiven=1, xgiven=[0.5;;])
 1: 0.0001772444738865438 ± 3.5121869189594667e-10 (prob.: 0.0)
Integrand evaluations: 9217
Number of subregions:  70
Note: The desired accuracy was reached

julia> cuhre(integrand)
 1: 2.608731354364103e-6 ± 2.135531666691244e-10 (prob.: 1.0)
Integrand evaluations: 35165
Number of subregions:  271
Note: The desired accuracy was reached

julia> cuhre(integrand; rtol=1e-8)
 1: 0.00017723214024825766 ± 1.7700547231374744e-12 (prob.: 1.0)
Integrand evaluations: 150865
Number of subregions:  1161
Note: The desired accuracy was reached

julia> cuhre(integrand; rtol=1e-12)
 1: 0.0001772412672726941 ± 9.869539325496199e-13 (prob.: 1.0)
Integrand evaluations: 154375
Number of subregions:  1188
Note: The desired accuracy was reached

The most accurate result is with divonne when passing the position of the peaks, which takes more evaluations than when not passing the peaks (but the result is also off from the correct result, although within the default, but quite low, tolerance), but it takes orders of magnitude fewer evaluations than cuhre, which I believe uses a similar algorithm to HCubature.