Zygote performances for simple function

rcalxrc08 · October 1, 2023, 1:39pm

Hi all,
I have a the following example in Zygote for computing a single gradient, and I don’t understand why I get this performance regression over the following examples:

using SpecialFunctions,BenchmarkTools,Zygote,IrrationalConstants

function normcdf_wrong(x)
    return erfc(x) / 2
end

function blsdelta_wrong(S0, K, r, T, σ, d)
    rt = r * T
    dt = -d * T
    sigma_sqrtT = σ * sqrt(T)
    S0_K = S0 / K
    d_1 = (log(S0_K) + rt + dt) / sigma_sqrtT + sigma_sqrtT / 2
    Δ = exp(dt) * normcdf_wrong(d_1)
    return Δ
end

function normcdf_wrong_2(x)
    return erfc(-x) / 2
end

function blsdelta_wrong_2(S0, K, r, T, σ, d)
    rt = r * T
    dt = -d * T
    sigma_sqrtT = σ * sqrt(T)
    S0_K = S0 / K
    d_1 = (log(S0_K) + rt + dt) / sigma_sqrtT + sigma_sqrtT / 2
    Δ = exp(dt) * normcdf_wrong_2(d_1)
    return Δ
end

function normcdf_correct(x)
    return erfc(-x * invsqrt2) / 2
end

function blsdelta_correct(S0, K, r, T, σ, d)
    rt = r * T
    dt = -d * T
    sigma_sqrtT = σ * sqrt(T)
    S0_K = S0 / K
    d_1 = (log(S0_K) + rt + dt) / sigma_sqrtT + sigma_sqrtT / 2
    Δ = exp(dt) * normcdf_correct(d_1)
    return Δ
end

S0=100.0; K=100.0; r=0.02;T=1.2;σ=0.2; d=0.01;

@btime Zygote.gradient(blsdelta_wrong, $S0, $K, $r, $T, $σ, $d); #Performance are fine
@btime Zygote.gradient(blsdelta_wrong_2, $S0, $K, $r, $T, $σ, $d); #Allocating?
@btime Zygote.gradient(blsdelta_correct, $S0, $K, $r, $T, $σ, $d);#Allocating?

The first call is fine as performance, but unfortunately it isn’t correct. The second isn’t correct either but it is allocating (the only difference is a minus). The third one is correct from numerical point of view but it allocates for some reason.
Has anybody experienced anything like this?
(I think it is the first post I write, I am sorry in advance for the moderators)

tchebycheff · October 1, 2023, 2:56pm

I see very similar benchmark stats for correct and “wrong” implementations. Can you post your results?

using Flux, SpecialFunctions, BenchmarkTools

function normcdf(x)
    return erfc(-x*sqrt(0.5)) * 0.5
end

function bsdelta(S, K, r, T, σ, d)
    v = σ * sqrt(T)
    d1 = (log(S/K) + (r-d)*T) / v + 0.5 * v
    return exp(-d*T) * normcdf(d1)
end

pars = [1.0, 1.0, 0.02, 1.2, 0.2, 0.01]
@btime Flux.gradient(p -> bsdelta(p...), $(pars))

 1.297 μs (42 allocations: 5.73 KiB)
([1.7750677670750958, -1.7750677670750958, 2.130081320490115, 0.021040851149368833, 0.10650406602450575, -2.8003011633010275],)

rcalxrc08 · October 1, 2023, 3:04pm

I see the following:

@btime Zygote.gradient(blsdelta_wrong, $S0, $K, $r, $T, $σ, $d); #Performance are fine
  #65.988 ns (1 allocation: 16 bytes)

@btime Zygote.gradient(blsdelta_wrong_2, $S0, $K, $r, $T, $σ, $d); #Allocating?
  #2.567 μs (45 allocations: 2.86 KiB)

@btime Zygote.gradient(blsdelta_correct, $S0, $K, $r, $T, $σ, $d);#Allocating?
  #2.522 μs (45 allocations: 2.92 KiB)

ToucheSir · October 1, 2023, 3:15pm

The problem is almost certainly with -x * invsqrt2. IrrationalConstants is likely doing a bunch of computation behind the scenes which is not Zygote-friendly. If you substitute -x * inv(sqrt(2)), the allocations should disappear.

rcalxrc08 · October 1, 2023, 3:22pm

There is no irrational in the second example

tchebycheff · October 1, 2023, 3:36pm

I am surprised at the first result (1 allocation). I see the same allocation for all three cases (except that I use sqrt(0.5) for the correct case).

rcalxrc08 · October 1, 2023, 3:38pm

The second example proves that it isn’t really related to the irrationals.
Anyway, I don’t use sqrt(0.5) because in case of Float32 I add precision that I don’t need, and in case of BigFloats I lose precision.

ToucheSir · October 1, 2023, 4:06pm

You’re right, mea culpa. The next thing to do would be to use the usual tools for detecting type instabilities (@code_warntype, Cthulhu.jl, etc) to dig further.

That’s because you’re splatting, I think. Splatting will always incur some allocations.

That’s why we have generic functions which let you work with multiple different number types. e.g. sqrt(one(x) / 2) should not add or subtract precision for Float32 and BigFloat.

rcalxrc08 · October 1, 2023, 4:13pm

Just as a comment, one(x) wouldn’t work for other number types, like complex or dual numbers.

ToucheSir · October 1, 2023, 8:50pm

At least for those two examples it works, but perhaps you have other constraints that weren’t expressed in the original post.

Back to the allocations, I had a look at both the generated Julia and LLVM IR with Cthulhu. It appears that replacing -x with a different operation (e.g. sin(x)) has the same outcome. Yet I’m only able to see one allocation at either the Julia or LLVM level. It feels like this is somehow hitting some compiler heuristic, but I can’t figure out what.

tchebycheff · October 2, 2023, 2:27am

I saw the application (calculating second order greeks) and thus was not worried about precision beyond Float32 (meaningless).

rcalxrc08 · October 2, 2023, 9:14am

[Not related to post] Since when the precision is meaningless?
Greeks aren’t used just for hedging where maybe you need precision up to cents.

rcalxrc08 · October 2, 2023, 1:56pm

If I delete exp(dt) from the functions the allocations get back to normal. Do you have any idea what is wrong with this?

tchebycheff · October 2, 2023, 2:14pm

The amount of precision you need is a function of application. From my experience in financial engineering, I have never encountered a case for precision beyond Float32 for delta, vega, gamma, vanna, theta, rho. I am keen to hear about your use case which requires it.

rcalxrc08 · October 3, 2023, 9:19am

[Not related again to the post]
It is a library function and I don’t know the potential final use.
I can even assume that somebody wants to do Taylor expansion over that, in that case the precision is fundamental. Another example could be an implied volatility calculator.

I don’t know exactly where it will stay in a generic pipeline, and forcing Float32 instead of letting the client choose is not a good idea, especially in languages like julia where there is no need for that.

Topic		Replies	Views
Allocations in Zygote for a simple derivative Machine Learning performance , zygote	0	695	February 19, 2020
Zygote Performance Machine Learning question	22	5035	September 23, 2019
Speed of vectorized vs for-loops using Zygote Performance zygote , tullio	20	2317	June 1, 2020
Zygote gradient results in slow Tuple getindex calls Performance zygote	6	919	May 20, 2021
Zygote dozens* of times slower than manually written function Performance zygote , forwarddiff	17	1832	April 21, 2022

Zygote performances for simple function

Related topics