Performance problems in one-dimensional complex integration

Hello to all
I am currently moving some code to Julia with the intention of improving performance. However, within a function I am running into a bottleneck when calculating the integral in one dimension of a complex function in the range [0, 1000].

This function is as follows

using HCubature

S₀ = 4000.0
K = 100.0
V₀ = 0.83500
r = 0.03
κ = 2.39271
θ = 0.01887
σ = 1.48821
λ = -0.38643
ρ = -0.97271
τ = 1.0

function price_algorithm(S₀::Float64, V₀::Float64, κ::Float64, θ::Float64, 
    σ::Float64, ρ::Float64, λ::Float64, K::Float64, τ::Float64, r::Float64)

    a = κ * θ
    b = κ + λ

    function price_function(ϕ::T) where {T<:Number}

        c = ρ * σ * ϕ * 1im

        d = sqrt((c - b)^2 + (ϕ * 1im + ϕ^2) * σ^2)

        g = (b - c + d) / (b - c - d)

        part1 = exp(r * ϕ * 1im * τ)
        part2 = S₀^(ϕ * 1im) * ((1 - g * exp(d * τ)) / (1 - g))^(-2 * a / σ^2)
        part3 = exp(a * τ * (b - c + d) / σ^2 + V₀ * (b - c + d) * ((1 - exp(d * τ)) / (1 - g * exp(d * τ))) / σ^2)

        return part1 * part2 * part3

    end

    function price_integral(ϕ::Float64)

        num_integral = exp(r * τ) * price_function(ϕ - 1im) - K * price_function(ϕ)
    
        dem_integral = 1im * ϕ * K^(1im * ϕ)
    
        return num_integral / dem_integral
    
    end

    integral, _ = hquadrature(price_integral, 0, 1000)

    return (S₀ - K * exp(-r * τ)) / 2 + real(integral) / π

end

The performance measured through @benchmark is.

julia> using BenchmarkTools

julia> @benchmark price_algorithm($S₀, $V₀, $κ, $θ, $σ, $ρ, $λ, $K, $τ, $r)

BenchmarkTools.Trial: 110 samples with 1 evaluation.
 Range (min … max):  39.698 ms … 90.912 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     42.571 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.532 ms ±  7.154 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▄▁█ ▃         
  ▇███▄█▃▅▅▃▃▃▄▆▄▃▄▄▆▁▄▄▄▅▁▄▄▁▄▁▃▁▄▃▁▃▁▁▁▁▄▁▃▁▁▁▁▁▁▁▃▁▁▁▁▁▃▁▄ ▃
  39.7 ms         Histogram: frequency by time        64.5 ms <

 Memory estimate: 129.31 KiB, allocs estimate: 6.

Which seems to me to be extremely slow.

I would appreciate any recommendations on how to integrate this function in a more efficient way.

Note: Although I know the QuadGK package and its function to integrate in one dimension quadgk, this function gives a domain error for very small values close to zero (for example 1.424047472694446089e-303).

Probably that’s because you aren’t passing an absolute tolerance. Or any tolerance at all, for that matter. The default is a relative tolerance of √ε (about 8 digits), which may be more accurate than you need, and is definitely more accurate than you need if the integral mostly cancels.

Update: Actually it’s because the imaginary part of your integral is blowing up for your test values. If I add a @show before the integral, _ = ... in your posted code, I get:

(integral, _) = hquadrature(price_integral, 0, 1000, atol = 0.001) = (6453.582138997572 - Inf*im, Inf)

So, you are potentially timing garbage, and quadgk is complaining about the divergence.

1 Like

One useful thing I would always recommend when you are investigating this sort of thing is to plot your integrand. Here, the basic problem is that the imaginary part of your integrand blows up:


Since you only need the real part of the integral at the end, I would suggest taking the real part inside the integral instead of afterwards — then you don’t waste effort trying to integrate the divergent term that you discard anyway.

Furthermore, rather than integrating from 0 to 1000, which I’m guessing is an approximation for integrating from 0 to \infty, I would just integrate directly from 0 to Inf in QuadGK, which should typically be much more efficient.

If you are still getting NaNs, you’ll want to carefully audit your integrand function to figure out what’s going on, because you may be having floating-point overflow or similar issues that will affect correctness (but can generally be eliminated by careful rewriting).

1 Like

In particular, I noticed that your exponentials are overflowing when real(d * τ) gets large, producing NaN, but in this regime the integrand should be exponentially small (magnitude < 10^{-300}) and hence you can just return zero. So I added a check:

real(d * τ) > 700 && return zero(c)

after the d = ... line in price_function. Then I changed your integration line to:

integral, _ = quadgk(real ∘ price_integral, 0, Inf)

The computation time (as reported by @btime) sped up by 7x on my machine.

This is still using the default tolerance of rtol = sqrt(eps()). If I change it to pass rtol = 1e-3 (3 significant digits), then I get another 8x speedup (down to ≈ 0.3 ms).

2 Likes

By the way, all of these type declarations accomplish nothing for performance. They just make your code less readable and less generic. Read the manual section on “Argument-Type Declarations”.

1 Like

There are various additional optimizations that can help to speed up your integrand. e.g. a simple one is to use cis(x) for e^{ix}, e.g. I got a 20% speedup by using the following in price_function instead of generic exponentiation:

part1 = cis(r * ϕ * τ)
part2 = cis(logS₀ * ϕ) * ((1 - g * exp(d * τ)) / (1 - g))^(-2 * a / σ^2)

where logS₀ = log(S₀).

To do significantly better you might have to do more analysis of your integrand. e.g. if you can analytically work out the asymptotic decay rate, and can rescale it to e^{-x}, then you can probably use a Gauss–Laguerre quadrature scheme to good effect.

2 Likes

Thank you very much for your advice, I have just noticed an 100x performance improvement over the original code.
I will try to do more analysis of my integrated to further improve the results.

@btime price_algorithm($S₀, $V₀, $κ, $θ, $σ, $ρ, $λ, $K, $τ, $r)
379.400 μs (3 allocations: 1.69 KiB)

Isn’t 379µs a 100x improvement over your original code (40ms), not 8x?

1 Like

Sorry, I misspelled the number

Some more issues:

  1. exp(d * τ) repeats multiple times in the same function. I don’t know whether Julia is able to fix this via common subexpression elimination, however in any case it would make sense to deduplicate your code.

  2. Your functions price_function and price_integral capture many variables from price_algorithm, and you do it in such a way that (still AFAIK) causes huge performance problems in Julia. This is the relevant section in the manual.

1 Like

Julia isn’t yet smart enough to remove the redundant exp(d * τ) calls. We have the analysis (the conditions are exactly the same as dead code elimination), but no one has written the pass that will remove the redundancy. Rebase of julia effects to LLVM by gbaraldi · Pull Request #50188 · JuliaLang/julia · GitHub probably would let LLVM remove the redundant call though.

1 Like

Probably not for complex arguments?

I don’t think that’s an issue here? That would show up as lots of additional allocations in @btime, which aren’t happening here.

1 Like

On the Julia side the effects are good

julia> Base.infer_effects(exp, (Complex{Float64},))
(+c,+e,!n,+t,+s,+m,+i)

So if we taught LLVM about the Julia effects which that PR does, LLVM should be able to remove the call.

2 Likes

I guess that the compiler sees that the captures are read-only so it doesn’t box them? Nice!

As far as I see you could move the exp(r * τ) to outside price_integral.

1 Like