Speeding up N-dimensional integration

I am struggling to speed up a N-dimensional integration problem. To me it looks that allocations are the problem. I chased as many allocations I could find in the code but I still would like to speed-up the code by at least two orders of magnitude.

@btime ich1_pol_(8e-8)
436.081 ms (4472909 allocations: 315.91 MiB)

It is a four dimensional function and I would like to integrate three of the dimensions and visualise the resulting function ich1_pol_ .

I see order of integrations is very important: Unreasonable slow speed in numerical integration - #17 by swish47 but I don’t have the intuition on how to split up the problem. Right now I am using the Cubature.jl package.

From a quick review I see you’re using an insane amount of non constant global variables, i think this is a real bottleneck, it’s not that bad for Matrix because the inner element will be type stable but with SMatrix all elements will end up being considered Any

I know about the global scope. For reals I run the code in a function and there is almost no performance difference.

Here is a profiler view for the code: pastebin

All allocations are in the body of f_Ich_pol where I do not see anything that could possibly allocate and in the hcubature function.

Oh ok will try home later on :+1:

There’s this:

f_Ich_pol_(x) = f_Ich_pol(x[1], x[2], x[3], x[4])

ich1_pol_(r1) = hcubature(x -> f_Ich_pol_([r1, x[1], x[2], x[3]]), 
                          (int_ϕ1[1],int_r2[1],int_ϕ2[1]), 
                          (int_ϕ1[2],int_r2[2],int_ϕ2[2]); 
                         reltol=1e-3)[1]

You allocate a vector [r1, x[1], x[2], x[3]] which you immediately unpack in f_Ich_pol_. Have you tried to use a tuple instead?

ich1_pol_(r1) = hcubature(x -> f_Ich_pol_((r1, x[1], x[2], x[3])), 
...

Thanks! 10% less allocations but speed is still the same :frowning:

The only allocations I see now are here:

        0 function f_Ich_pol(r1, ϕ1, r2, ϕ2)
  4398240     Ich = @SMatrix[pol2cart(r1, ϕ1); pol2cart(r2, ϕ2)]
 58747920     return r1 * r2 * f_Ich(Ich)
        - end

where e.g.

pol2cart(r, ϕ) = r * (cos(ϕ) + sin(ϕ) * 1im)

that returns a ComplexF64 so I do not understand where do allocations come from in line Ich = @SMatrix.

reduced quite a bit (4x speedup for me)


using LinearAlgebra, QuadGK, Distributions, StaticArrays, StatsBase, Parameters, Plots, Cubature


const f = 7.5e6
const C1 = 100e-12
const C2 = 100e-12
const L1 = 20e-9
const L2 = 25e-9
const Lc = 20e-9
const V1 = 100
const Vg = 20
const circuit = (C1=C1, C2=C2, L1=L1, L2=L2, Lc=Lc, V1=V1, Vg=Vg)
const T = 1e-3
const s = 2π / T + 2π * f * 1im

# @unpack C1, C2, L1, L2, Lc, V1, Vg = circuit

const Γg₁, ΓVth₁, θ₁ = Gamma(182.3, 0.0303), Gamma(172.27, 0.04218), -7.5
const Γg₂, ΓVth₂, θ₂ = Gamma(182.3, 0.0303), Gamma(172.27, 0.04218), -7.5

# Define vectors and matrices

A() = @SMatrix [1/(s*L1)+s*C1 0 -s*C1 -1/(s*L1) 0; 0 1/(s*L2)+s*C2 -s*C2 -1/(s*L2) 0; -s*C1 -s*C2 s*(C1+C2) 0 1; -1/(s*L1) -1/(s*L2) 0 1/(s*Lc)+1/(s*L1)+1/(s*L2) 0; 0 0 1 0 0]
B(g) = @SMatrix [-g[1] 0 0 0 0; 0 -g[2] 0 0 0; g[1] g[2] 0 0 0; 0 0 0 0 0; 0 0 0 0 0]
MI() = @SMatrix [1 0 0 0 0; 0 1 0 0 0]
mD(g) = @SMatrix [g[1]*Vg/s; g[2]*Vg/s; -(g[1] + g[2])*Vg/s; 0; 0]
MD(g) = @SMatrix [-g[1]/s 0; 0 -g[2]/s; g[1]/s g[2]/s; 0 0; 0 0]
Iinit() = @SMatrix [-C1 * (V1 - Vg + mean(ΓVth₁)); -C2 * (V1 - Vg + mean(ΓVth₂)); (C1 + C2) * (V1 - Vg) + C1 * mean(ΓVth₁) + C2 * mean(ΓVth₂); 0; V1 / s]

# Define distributions

frankcopula_density(u₁, u₂, θ) = θ * (1 - exp(-θ)) * exp(-θ * (u₁ + u₂)) * ((1 - exp(-θ)) - (1 - exp(-θ * u₁)) * (1 - exp(-θ * u₂)))^-2

Finv_Vth1(u) = quantile(ΓVth₁, u)
Finv_g1(u) = quantile(Γg₁, u)
Finv_Vth2(u) = quantile(ΓVth₂, u)
Finv_g2(u) = quantile(Γg₂, u)

f_Vth₁_g₁(v, g) = frankcopula_density(cdf(ΓVth₁, v), cdf(Γg₁, g), θ₁) * pdf(ΓVth₁, v) * pdf(Γg₁, g)
f_Vth₂_g₂(v, g) = frankcopula_density(cdf(ΓVth₂, v), cdf(Γg₂, g), θ₂) * pdf(ΓVth₂, v) * pdf(Γg₂, g)
f_Vth_g(v, g) = f_Vth₁_g₁(v[1], g[1]) * f_Vth₂_g₂(v[2], g[2])

# Calculate matrices

K1(g) = MI() * (A() * ((A() - B(g)) \ (mD(g) + Iinit()) - Iinit()))
K2(g) = MI() * A() * ((A() - B(g)) \ MD(g))

# Define h-tilde and h-tilde^-1

h(Vth, g) = K1(g) + K2(g) * Vth

function hinvg(Ich)
    γ = real(1/s) / imag(1/s)
    Ĩch = @SMatrix [Ich[1]; Ich[2]; -Ich[1]-Ich[2]; 0; 0]
    u0 = real(MI() * Ĩch) - γ * imag(MI() * Ĩch)
    invA_I = A() \ (Ĩch + Iinit())
    U_1 = real(MI() * B(@SMatrix [1 0]) * invA_I) - γ * imag(MI() * B(@SMatrix [1 0]) * invA_I)
    U_2 = real(MI() * B(@SMatrix [0 1]) * invA_I) - γ * imag(MI() * B(@SMatrix [0 1]) * invA_I)
    U = [U_1 U_2]
    g = U \ u0
    return g
end

hinvVth(Ich, g) = real(K2(g) \ (Ich - K1(g)))  # real() used to avoid numbers become complex by numerical calculations

function hinv(Ich)
    g = hinvg(Ich)
    Vth = hinvVth(Ich, g)
    return (Vth, g)
end

# Define Jacobian

function dhinv(Vth, g)
    dMDdg1 = @SMatrix [-1/s 0; 0 0; 1/s 0; 0 0; 0 0]
    dMDdg2 = @SMatrix [0 0; 0 -1/s; 0 1/s; 0 0; 0 0]
    dmDdg1 = @SMatrix [Vg/s; 0; -Vg/s; 0; 0]
    dmDdg2 = @SMatrix [0; Vg/s; -Vg/s; 0; 0]
    AA = A()
    MII = MI()
    invA_B = inv(AA - B(g))
    invA_B_M = invA_B * (mD(g) + MD(g) * Vth + Iinit())
    dhdg1 = MII * AA * (invA_B * (B(@SMatrix [1 0])) * invA_B_M + invA_B * (dmDdg1 + dMDdg1 * Vth))
    dhdg2 = MII * AA * (invA_B * (B(@SMatrix [0 1])) * invA_B_M + invA_B * (dmDdg2 + dMDdg2 * Vth))
    K22 = K2(g)
    Jh = vcat(hcat(real(K22), real(dhdg1), real(dhdg2)), hcat(imag(K22), imag(dhdg1), imag(dhdg2)))
    return inv(Jh)
end

function f_Ich(Ich)
    Vth, g = hinv(Ich)
    J = dhinv(Vth, g)
    return abs(det(J)) * f_Vth_g(Vth, g)
end

function f_Ich_pol(r1::Float64, ϕ1::Float64, r2::Float64, ϕ2::Float64)
    Ich = @SMatrix [pol2cart(r1, ϕ1); pol2cart(r2, ϕ2)]
    return r1 * r2 * f_Ich(Ich)
end

# function extremes(prob_interval, Finv_Vths, Finv_gs)
#     Finv_Vth = SMatrix{2,1}(Finv_Vths)
#     Finv_g = SMatrix{2,1}(Finv_gs)
#     collect(h(pi_vth .|> Finv_Vth, pi_g .|> Finv_g) for pi_vth = prob_interval for pi_g = prob_interval)
# end

function extremes(prob_interval, Finv_Vths, Finv_gs)
    pts_Vths = [[Finv_Vths[1](p1); Finv_Vths[2](p2);;] for p1 = prob_interval for p2 = prob_interval]
    pts_gs = [[Finv_gs[1](p1); Finv_gs[2](p2);;] for p1 = prob_interval for p2 = prob_interval]
    [h(Vth, g) for Vth in pts_Vths for g in pts_gs]
end

pts = extremes((0.0001, 0.9999), [Finv_Vth1, Finv_Vth2], [Finv_g1, Finv_g2])

currents = reshape(collect(Iterators.flatten(pts)),2,length(pts))

Ichs = mean(currents, dims=2)
Ich1 = Ichs[1]
Ich2 = Ichs[2]

cart2pol(x) = (r=abs(x), ϕ=acos(real(x)/abs(x)) * sign(imag(x)))
pol2cart(r, ϕ) = r * (cos(ϕ) + sin(ϕ) * 1im)

currents_pol = cart2pol.(currents)
extrema_r = extrema((x -> x.r) ∘ cart2pol, currents, dims=2)
extrema_ϕ = extrema((x -> x.ϕ) ∘ cart2pol, currents, dims=2)
int_r1, int_ϕ1 = extrema_r[1], extrema_ϕ[1]
int_r2, int_ϕ2 = extrema_r[2], extrema_ϕ[2]

ich1_pol(r, ϕ) = f_Ich_pol(r, ϕ, cart2pol(Ich2)...) # fix ch2 current and get ch1
ich1_pol(r) = quadgk(ϕ -> ich1_pol(r, ϕ), int_ϕ1..., rtol=1e-3)[1] 
ich1_pol_norm = quadgk(ich1_pol, int_r1..., rtol=1e-3)[1]
ich1(is) = ich1_pol(is) / ich1_pol_norm

ich2_pol(r, ϕ) = f_Ich_pol(cart2pol(Ich1)..., r, ϕ) # fix ch1 current and get ch2
ich2_pol(r) = quadgk(ϕ -> ich2_pol(r, ϕ), int_ϕ2..., rtol=1e-3)[1]
ich2_pol_norm = quadgk(ich2_pol, int_r2..., rtol=1e-3)[1]
ich2(is) = ich2_pol(is) / ich2_pol_norm

r1 = range(int_r1..., length=100)
ϕ1 = range(int_ϕ1..., length=100)
r2 = range(int_r2..., length=100)
ϕ2 = range(int_ϕ2..., length=100)

mean_ichs = [mean(ΓVth₁); mean(ΓVth₂);;], [mean(Γg₁); mean(Γg₂);;]
@assert all((hinv ∘ h)(mean_ichs...) .≈ mean_ichs) # Test if (hinv ∘ h) is identity
mean_ichs_pol = h(mean_ichs...) .|> cart2pol

# Plot ch1 and ch2 currents in polar coordinates
h_contour = plot(map(x->x.r,mean_ichs_pol)', map(x->x.ϕ,mean_ichs_pol)', markershape=:star6, label=["ch1 mean" "ch2 mean"])
contour!(r1, ϕ1, ich1_pol.(r1', ϕ1))
contour!(r2, ϕ2, ich2_pol.(r2', ϕ2))
savefig(h_contour, "ch1_ch2_contour.png")

# Plot the distributions of ch1 and ch2 currents magnitudes
ich_lims = extrema_r |> Iterators.flatten |> collect |> extrema
h_currents = plot([ich1, ich2], xlims=ich_lims, label=["ich1" "ich2"])
savefig(h_currents, "ch1_ch2.png")

f_Ich_pol_(x) = f_Ich_pol(x[1], x[2], x[3], x[4])
ich1_pol_(r1) = hcubature(x -> f_Ich_pol_(SA[r1, x[1], x[2], x[3]]), 
                          (int_ϕ1[1],int_r2[1],int_ϕ2[1]), 
                          (int_ϕ1[2],int_r2[2],int_ϕ2[2]); 
                          reltol=1e-3)[1]
ich2_pol_(r2) = hcubature(x -> f_Ich_pol_(SA[x[2], x[1], r2, x[3]]), 
                          (int_ϕ1[1],int_r1[1],int_ϕ2[1]), 
                          (int_ϕ1[2],int_r1[2],int_ϕ2[2]); 
                          reltol=1e-3)[1]
ich1_pol_(mean_ichs_pol[1].r)
ich2_pol_(mean_ichs_pol[2].r)

xich = range(6e-8, 1.2e-7, length=50)
ich1_pol_y = ich1_pol_.(xich)
ich2_pol_y = ich2_pol_.(xich)
plot(xich, [ich1_pol_y ich2_pol_y], label=["ch1" "ch2"])

using BenchmarkTools
@btime ich1_pol_(8e-8)

@profview for _ in 1:10 ich1_pol_(8e-8) end

you can remove const if you put everything in a function, may have changed something by errir

If you’re going to nest quadgk calls, I would suggest trying GitHub - lxvm/IteratedIntegration.jl: Iterated h-adaptive integration (IAI), which does this more efficiently (you want different nested calls to be “aware” of one another so that it is globally adaptive).

I still see a lot of global variables in this code, however.

This will be running in a function/module once the method is generalised. I benchmark it in a function call right now.

function dich(circuit)
    ...
    return ich1_pol_, ich2_pol_
end

PS. I only care about the ich1_pol_ and ich2_pol_ functions. ich1_pol is a previous, incorrect mathematically, attempt.

I will try this. I tried nesting quadgk calls naively and it did not go well. It was super slow.

Yes, you often have to be really careful with the tolerances (you often have to set an absolute tolerance in the inner calls) if you are going to nest 1d adaptive quadrature (as opposed to using a natively multidimensional package like HCubature.jl). @lxvm’s package does it better.