Speeding up N-dimensional integration

owiecc · September 30, 2025, 9:14am

I am struggling to speed up a N-dimensional integration problem. To me it looks that allocations are the problem. I chased as many allocations I could find in the code but I still would like to speed-up the code by at least two orders of magnitude.

@btime ich1_pol_(8e-8)
436.081 ms (4472909 allocations: 315.91 MiB)

It is a four dimensional function and I would like to integrate three of the dimensions and visualise the resulting function ich1_pol_ .

I see order of integrations is very important: Unreasonable slow speed in numerical integration - #17 by swish47 but I don’t have the intuition on how to split up the problem. Right now I am using the Cubature.jl package.

yolhan_mannes · September 30, 2025, 9:29am

From a quick review I see you’re using an insane amount of non constant global variables, i think this is a real bottleneck, it’s not that bad for Matrix because the inner element will be type stable but with SMatrix all elements will end up being considered Any

owiecc · September 30, 2025, 10:23am

I know about the global scope. For reals I run the code in a function and there is almost no performance difference.

Here is a profiler view for the code: pastebin

All allocations are in the body of f_Ich_pol where I do not see anything that could possibly allocate and in the hcubature function.

yolhan_mannes · September 30, 2025, 10:34am

Oh ok will try home later on

sgaure · September 30, 2025, 11:31am

There’s this:

f_Ich_pol_(x) = f_Ich_pol(x[1], x[2], x[3], x[4])

ich1_pol_(r1) = hcubature(x -> f_Ich_pol_([r1, x[1], x[2], x[3]]), 
                          (int_ϕ1[1],int_r2[1],int_ϕ2[1]), 
                          (int_ϕ1[2],int_r2[2],int_ϕ2[2]); 
                         reltol=1e-3)[1]

You allocate a vector [r1, x[1], x[2], x[3]] which you immediately unpack in f_Ich_pol_. Have you tried to use a tuple instead?

ich1_pol_(r1) = hcubature(x -> f_Ich_pol_((r1, x[1], x[2], x[3])), 
...

owiecc · September 30, 2025, 12:13pm

Thanks! 10% less allocations but speed is still the same

owiecc · September 30, 2025, 12:18pm

The only allocations I see now are here:

        0 function f_Ich_pol(r1, ϕ1, r2, ϕ2)
  4398240     Ich = @SMatrix[pol2cart(r1, ϕ1); pol2cart(r2, ϕ2)]
 58747920     return r1 * r2 * f_Ich(Ich)
        - end

where e.g.

pol2cart(r, ϕ) = r * (cos(ϕ) + sin(ϕ) * 1im)

that returns a ComplexF64 so I do not understand where do allocations come from in line Ich = @SMatrix.

yolhan_mannes · September 30, 2025, 12:37pm

reduced quite a bit (4x speedup for me)


using LinearAlgebra, QuadGK, Distributions, StaticArrays, StatsBase, Parameters, Plots, Cubature


const f = 7.5e6
const C1 = 100e-12
const C2 = 100e-12
const L1 = 20e-9
const L2 = 25e-9
const Lc = 20e-9
const V1 = 100
const Vg = 20
const circuit = (C1=C1, C2=C2, L1=L1, L2=L2, Lc=Lc, V1=V1, Vg=Vg)
const T = 1e-3
const s = 2π / T + 2π * f * 1im

# @unpack C1, C2, L1, L2, Lc, V1, Vg = circuit

const Γg₁, ΓVth₁, θ₁ = Gamma(182.3, 0.0303), Gamma(172.27, 0.04218), -7.5
const Γg₂, ΓVth₂, θ₂ = Gamma(182.3, 0.0303), Gamma(172.27, 0.04218), -7.5

# Define vectors and matrices

A() = @SMatrix [1/(s*L1)+s*C1 0 -s*C1 -1/(s*L1) 0; 0 1/(s*L2)+s*C2 -s*C2 -1/(s*L2) 0; -s*C1 -s*C2 s*(C1+C2) 0 1; -1/(s*L1) -1/(s*L2) 0 1/(s*Lc)+1/(s*L1)+1/(s*L2) 0; 0 0 1 0 0]
B(g) = @SMatrix [-g[1] 0 0 0 0; 0 -g[2] 0 0 0; g[1] g[2] 0 0 0; 0 0 0 0 0; 0 0 0 0 0]
MI() = @SMatrix [1 0 0 0 0; 0 1 0 0 0]
mD(g) = @SMatrix [g[1]*Vg/s; g[2]*Vg/s; -(g[1] + g[2])*Vg/s; 0; 0]
MD(g) = @SMatrix [-g[1]/s 0; 0 -g[2]/s; g[1]/s g[2]/s; 0 0; 0 0]
Iinit() = @SMatrix [-C1 * (V1 - Vg + mean(ΓVth₁)); -C2 * (V1 - Vg + mean(ΓVth₂)); (C1 + C2) * (V1 - Vg) + C1 * mean(ΓVth₁) + C2 * mean(ΓVth₂); 0; V1 / s]

# Define distributions

frankcopula_density(u₁, u₂, θ) = θ * (1 - exp(-θ)) * exp(-θ * (u₁ + u₂)) * ((1 - exp(-θ)) - (1 - exp(-θ * u₁)) * (1 - exp(-θ * u₂)))^-2

Finv_Vth1(u) = quantile(ΓVth₁, u)
Finv_g1(u) = quantile(Γg₁, u)
Finv_Vth2(u) = quantile(ΓVth₂, u)
Finv_g2(u) = quantile(Γg₂, u)

f_Vth₁_g₁(v, g) = frankcopula_density(cdf(ΓVth₁, v), cdf(Γg₁, g), θ₁) * pdf(ΓVth₁, v) * pdf(Γg₁, g)
f_Vth₂_g₂(v, g) = frankcopula_density(cdf(ΓVth₂, v), cdf(Γg₂, g), θ₂) * pdf(ΓVth₂, v) * pdf(Γg₂, g)
f_Vth_g(v, g) = f_Vth₁_g₁(v[1], g[1]) * f_Vth₂_g₂(v[2], g[2])

# Calculate matrices

K1(g) = MI() * (A() * ((A() - B(g)) \ (mD(g) + Iinit()) - Iinit()))
K2(g) = MI() * A() * ((A() - B(g)) \ MD(g))

# Define h-tilde and h-tilde^-1

h(Vth, g) = K1(g) + K2(g) * Vth

function hinvg(Ich)
    γ = real(1/s) / imag(1/s)
    Ĩch = @SMatrix [Ich[1]; Ich[2]; -Ich[1]-Ich[2]; 0; 0]
    u0 = real(MI() * Ĩch) - γ * imag(MI() * Ĩch)
    invA_I = A() \ (Ĩch + Iinit())
    U_1 = real(MI() * B(@SMatrix [1 0]) * invA_I) - γ * imag(MI() * B(@SMatrix [1 0]) * invA_I)
    U_2 = real(MI() * B(@SMatrix [0 1]) * invA_I) - γ * imag(MI() * B(@SMatrix [0 1]) * invA_I)
    U = [U_1 U_2]
    g = U \ u0
    return g
end

hinvVth(Ich, g) = real(K2(g) \ (Ich - K1(g)))  # real() used to avoid numbers become complex by numerical calculations

function hinv(Ich)
    g = hinvg(Ich)
    Vth = hinvVth(Ich, g)
    return (Vth, g)
end

# Define Jacobian

function dhinv(Vth, g)
    dMDdg1 = @SMatrix [-1/s 0; 0 0; 1/s 0; 0 0; 0 0]
    dMDdg2 = @SMatrix [0 0; 0 -1/s; 0 1/s; 0 0; 0 0]
    dmDdg1 = @SMatrix [Vg/s; 0; -Vg/s; 0; 0]
    dmDdg2 = @SMatrix [0; Vg/s; -Vg/s; 0; 0]
    AA = A()
    MII = MI()
    invA_B = inv(AA - B(g))
    invA_B_M = invA_B * (mD(g) + MD(g) * Vth + Iinit())
    dhdg1 = MII * AA * (invA_B * (B(@SMatrix [1 0])) * invA_B_M + invA_B * (dmDdg1 + dMDdg1 * Vth))
    dhdg2 = MII * AA * (invA_B * (B(@SMatrix [0 1])) * invA_B_M + invA_B * (dmDdg2 + dMDdg2 * Vth))
    K22 = K2(g)
    Jh = vcat(hcat(real(K22), real(dhdg1), real(dhdg2)), hcat(imag(K22), imag(dhdg1), imag(dhdg2)))
    return inv(Jh)
end

function f_Ich(Ich)
    Vth, g = hinv(Ich)
    J = dhinv(Vth, g)
    return abs(det(J)) * f_Vth_g(Vth, g)
end

function f_Ich_pol(r1::Float64, ϕ1::Float64, r2::Float64, ϕ2::Float64)
    Ich = @SMatrix [pol2cart(r1, ϕ1); pol2cart(r2, ϕ2)]
    return r1 * r2 * f_Ich(Ich)
end

# function extremes(prob_interval, Finv_Vths, Finv_gs)
#     Finv_Vth = SMatrix{2,1}(Finv_Vths)
#     Finv_g = SMatrix{2,1}(Finv_gs)
#     collect(h(pi_vth .|> Finv_Vth, pi_g .|> Finv_g) for pi_vth = prob_interval for pi_g = prob_interval)
# end

function extremes(prob_interval, Finv_Vths, Finv_gs)
    pts_Vths = [[Finv_Vths[1](p1); Finv_Vths[2](p2);;] for p1 = prob_interval for p2 = prob_interval]
    pts_gs = [[Finv_gs[1](p1); Finv_gs[2](p2);;] for p1 = prob_interval for p2 = prob_interval]
    [h(Vth, g) for Vth in pts_Vths for g in pts_gs]
end

pts = extremes((0.0001, 0.9999), [Finv_Vth1, Finv_Vth2], [Finv_g1, Finv_g2])

currents = reshape(collect(Iterators.flatten(pts)),2,length(pts))

Ichs = mean(currents, dims=2)
Ich1 = Ichs[1]
Ich2 = Ichs[2]

cart2pol(x) = (r=abs(x), ϕ=acos(real(x)/abs(x)) * sign(imag(x)))
pol2cart(r, ϕ) = r * (cos(ϕ) + sin(ϕ) * 1im)

currents_pol = cart2pol.(currents)
extrema_r = extrema((x -> x.r) ∘ cart2pol, currents, dims=2)
extrema_ϕ = extrema((x -> x.ϕ) ∘ cart2pol, currents, dims=2)
int_r1, int_ϕ1 = extrema_r[1], extrema_ϕ[1]
int_r2, int_ϕ2 = extrema_r[2], extrema_ϕ[2]

ich1_pol(r, ϕ) = f_Ich_pol(r, ϕ, cart2pol(Ich2)...) # fix ch2 current and get ch1
ich1_pol(r) = quadgk(ϕ -> ich1_pol(r, ϕ), int_ϕ1..., rtol=1e-3)[1] 
ich1_pol_norm = quadgk(ich1_pol, int_r1..., rtol=1e-3)[1]
ich1(is) = ich1_pol(is) / ich1_pol_norm

ich2_pol(r, ϕ) = f_Ich_pol(cart2pol(Ich1)..., r, ϕ) # fix ch1 current and get ch2
ich2_pol(r) = quadgk(ϕ -> ich2_pol(r, ϕ), int_ϕ2..., rtol=1e-3)[1]
ich2_pol_norm = quadgk(ich2_pol, int_r2..., rtol=1e-3)[1]
ich2(is) = ich2_pol(is) / ich2_pol_norm

r1 = range(int_r1..., length=100)
ϕ1 = range(int_ϕ1..., length=100)
r2 = range(int_r2..., length=100)
ϕ2 = range(int_ϕ2..., length=100)

mean_ichs = [mean(ΓVth₁); mean(ΓVth₂);;], [mean(Γg₁); mean(Γg₂);;]
@assert all((hinv ∘ h)(mean_ichs...) .≈ mean_ichs) # Test if (hinv ∘ h) is identity
mean_ichs_pol = h(mean_ichs...) .|> cart2pol

# Plot ch1 and ch2 currents in polar coordinates
h_contour = plot(map(x->x.r,mean_ichs_pol)', map(x->x.ϕ,mean_ichs_pol)', markershape=:star6, label=["ch1 mean" "ch2 mean"])
contour!(r1, ϕ1, ich1_pol.(r1', ϕ1))
contour!(r2, ϕ2, ich2_pol.(r2', ϕ2))
savefig(h_contour, "ch1_ch2_contour.png")

# Plot the distributions of ch1 and ch2 currents magnitudes
ich_lims = extrema_r |> Iterators.flatten |> collect |> extrema
h_currents = plot([ich1, ich2], xlims=ich_lims, label=["ich1" "ich2"])
savefig(h_currents, "ch1_ch2.png")

f_Ich_pol_(x) = f_Ich_pol(x[1], x[2], x[3], x[4])
ich1_pol_(r1) = hcubature(x -> f_Ich_pol_(SA[r1, x[1], x[2], x[3]]), 
                          (int_ϕ1[1],int_r2[1],int_ϕ2[1]), 
                          (int_ϕ1[2],int_r2[2],int_ϕ2[2]); 
                          reltol=1e-3)[1]
ich2_pol_(r2) = hcubature(x -> f_Ich_pol_(SA[x[2], x[1], r2, x[3]]), 
                          (int_ϕ1[1],int_r1[1],int_ϕ2[1]), 
                          (int_ϕ1[2],int_r1[2],int_ϕ2[2]); 
                          reltol=1e-3)[1]
ich1_pol_(mean_ichs_pol[1].r)
ich2_pol_(mean_ichs_pol[2].r)

xich = range(6e-8, 1.2e-7, length=50)
ich1_pol_y = ich1_pol_.(xich)
ich2_pol_y = ich2_pol_.(xich)
plot(xich, [ich1_pol_y ich2_pol_y], label=["ch1" "ch2"])

using BenchmarkTools
@btime ich1_pol_(8e-8)

@profview for _ in 1:10 ich1_pol_(8e-8) end

you can remove const if you put everything in a function, may have changed something by errir

stevengj · September 30, 2025, 12:41pm

yolhan_mannes:

ich1_pol(r, ϕ) = f_Ich_pol(r, ϕ, cart2pol(Ich2)...) # fix ch2 current and get ch1
ich1_pol(r) = quadgk(ϕ -> ich1_pol(r, ϕ), int_ϕ1..., rtol=1e-3)[1] 
ich1_pol_norm = quadgk(ich1_pol, int_r1..., rtol=1e-3)[1]

If you’re going to nest quadgk calls, I would suggest trying GitHub - lxvm/IteratedIntegration.jl: Iterated h-adaptive integration (IAI), which does this more efficiently (you want different nested calls to be “aware” of one another so that it is globally adaptive).

I still see a lot of global variables in this code, however.

owiecc · September 30, 2025, 12:43pm

This will be running in a function/module once the method is generalised. I benchmark it in a function call right now.

function dich(circuit)
    ...
    return ich1_pol_, ich2_pol_
end

PS. I only care about the ich1_pol_ and ich2_pol_ functions. ich1_pol is a previous, incorrect mathematically, attempt.

owiecc · September 30, 2025, 12:48pm

I will try this. I tried nesting quadgk calls naively and it did not go well. It was super slow.

stevengj · September 30, 2025, 12:57pm

Yes, you often have to be really careful with the tolerances (you often have to set an absolute tolerance in the inner calls) if you are going to nest 1d adaptive quadrature (as opposed to using a natively multidimensional package like HCubature.jl). @lxvm’s package does it better.

lxvm · October 1, 2025, 2:17am

To answer the question of whether nested h-adaptive quadrature is better than h-adaptive cubature entirely depends on the nature of the integrand. Presumably, due to the interest in h-adaptive methods, the integrand has some narrow features that make adaptive quadrature desirable. If this 4-dimensional function has point-like features (0d) then HCubature.jl would be the most efficient. However, if the features are higher-dimensional, such as localized lines or surfaces, then nested quadrature becomes more efficient because the method is h-adaptive in each dimension whereas cubature will be h-adaptive along the dimensions normal to the surface but effectively does composite quadrature for dimensions along the surface with hyper-cubes of side length corresponding to the most refined/subdivided dimension. HCubature.jl is actually smarter than this and can use hyper-rectangles instead of hyper-cubes, but this doesn’t help if the surface is curved because then one dimension isn’t necessarily easier to integrate than another. This gets back to the observation in the OP that the order of integration can be important although a good order can usually be decided from some analysis of the integrand and whether it is smooth/easy-to-integrate or nearly singular/hard-to-integrate along certain dimensions or near a manifold. If there is no clear simplification that can be made, then nested h-adaptive quadrature should be reasonably efficient when implemented with careful choices of tolerances, however 4-dimensional integrals will still be costly if they contain very localized features

owiecc · October 1, 2025, 7:20am

The function in question is a probability distribution of two complex variables in polar coordinates. It should be smooth. We could maybe get away with integrating one of the variables and plotting the second on a 2D surface but showing just the radius distributions is more understandable for the target users.

lxvm · October 1, 2025, 2:17pm

Since you mentioned polar coordinates, the angular variables are periodic and therefore quadrature with equispaced nodes and uniform weights is exponentially convergent with a fast rate for smooth integrands. This is also equivalent to the trapezoidal rule. If the radial variables are also smooth, have you tried using pcubature from Cubature.jl?

Topic		Replies	Views
Unreasonable slow speed in numerical integration Performance question , package , quadgk , integral	16	2712	November 12, 2022
Double integration with hcubature slower than nested quadgk General Usage	8	1525	January 19, 2023
Slow cubature Numerics performance , quadrature , hcubature	21	560	June 25, 2024
Julia integral calculation - community module or own module? General Usage gpu , packages	17	7130	August 20, 2019
Numerical integration for array General Usage	17	4325	September 23, 2020

Speeding up N-dimensional integration

Related topics