How to tell ForwardDiff what i want it to do?

Hey,

This is a follow up to this previous post where I was looking for the right derivatives of the gamma_inc function. I found them. Now i am trying to tell ForwardDiff to use them… with issues.

The function has two arguments a and x. The derivtive w.r.t. x is already in ForwardDiff:

ForwardDiff.derivative(x -> gamma_inc(2,x)[2],2.0) # Works


I just added the derivative w.r.t a:

using SpecialFunctions, HypergeometricFunctions, ForwardDiff
function Γ(a,x)
return gamma_inc(a,x)[2]
end
function ∂Γ_∂a(a,x)
g = gamma(a)
dg = digamma(a)
G = Γ(a,x)
lx = log(x)
r = pFq([a,a],[a+1,a+1],-x)
return a^(-2) * x^a * r/g + (1 - G)*(dg - lx)
end
function SpecialFunctions.gamma_inc(d::ForwardDiff.Dual{T,<:Real}, x::Real, ind::Integer) where {T}
a = ForwardDiff.value(d)
p, q = SpecialFunctions.gamma_inc(a, x, ind)
∂q = ∂Γ_∂a(a,x)
return (ForwardDiff.Dual{T}(p, -∂q), ForwardDiff.Dual{T}(q, ∂q))
end
ForwardDiff.derivative(a -> gamma_inc(a,3.0)[2],2.0) # Works


But somehow, forwardiff does not want to pass dual numbers through both parameters :

ForwardDiff.derivative(a -> gamma_inc(a,3+a)[2],2.0) # Doesnt work.


Fair enough. Remembering my 101 lectures on the chain rule, I craft f(dual,dual) from f(dual,x) and f(a,dual) as follows:

function SpecialFunctions.gamma_inc(da::ForwardDiff.Dual{T,<:Real}, dx::ForwardDiff.Dual{T,<:Real}, ind::Integer) where {T}
a = ForwardDiff.value(da)
x = ForwardDiff.value(dx)
∂1 = SpecialFunctions.gamma_inc(da, x, ind::Integer)
∂2 = SpecialFunctions.gamma_inc(a, dx, ind::Integer)
return ∂1 .* da .+ ∂2 .* dx
end


And now it works :

ForwardDiff.derivative(a -> gamma_inc(a,3+a)[2],2.0) # Works !


Youpi ! So i go back to what i was doing, trying to use it… but no :

ForwardDiff.gradient(x -> gamma_inc(x[1]+x[2],x[1]-x[2])[2],[3.0,1.0]) # Still does not work...


I think that the derivatives i added should be enough to compute this gradient ! I do not understand what I missed. Could someone help?

From the stack trace, I think it may come from the line ∂1 .* da .+ ∂2 .* dx in my return statement, where, in case we are taking gradients w.r.t. several variables, the multiplications / additions do not work correctly; Maybe i should re-write this somehow differently ?

EDIT :

If I change the function to the following to see things :

function SpecialFunctions.gamma_inc(da::ForwardDiff.Dual{T,<:Real}, dx::ForwardDiff.Dual{T,<:Real}, ind::Integer) where {T}
a = ForwardDiff.value(da)
x = ForwardDiff.value(dx)
∂1 = SpecialFunctions.gamma_inc(da, x, ind::Integer)
∂2 = SpecialFunctions.gamma_inc(a, dx, ind::Integer)
@show da
@show dx
@show ∂1[1]
@show ∂1[1]
@show ∂2[1]
@show ∂2[2]
return (∂1[1] * da + ∂2[1] * dx,∂1[2] * da + ∂2[2] * dx)
end


Then the stacktrace I Have looks like :

julia> ForwardDiff.gradient(x -> gamma_inc(x[1]+x[2],x[1]-x[2])[2],[3.0,1.0]) # Still does not work...
da = Dual{ForwardDiff.Tag{var"#60#61", Float64}}(4.0,1.0,1.0)
dx = Dual{ForwardDiff.Tag{var"#60#61", Float64}}(2.0,1.0,-1.0)
∂1[1] = Dual{ForwardDiff.Tag{var"#60#61", Float64}}(0.1428765395014529,-0.1302623114706677)
∂1[1] = Dual{ForwardDiff.Tag{var"#60#61", Float64}}(0.1428765395014529,-0.1302623114706677)
∂2[1] = Dual{ForwardDiff.Tag{var"#60#61", Float64}}(0.1428765395014529,0.1804470443154836,-0.1804470443154836)
∂2[2] = Dual{ForwardDiff.Tag{var"#60#61", Float64}}(0.8571234604985472,-0.1804470443154836,0.1804470443154836)
ERROR: MethodError: no method matching _mul_partials(::ForwardDiff.Partials{1, Float64}, ::ForwardDiff.Partials{2, Float64}, ::Float64, ::Float64)
Closest candidates are:
_mul_partials(::ForwardDiff.Partials{0, A}, ::ForwardDiff.Partials{N, B}, ::Any, ::Any) where {N, A, B} at C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\partials.jl:141
_mul_partials(::ForwardDiff.Partials{N, A}, ::ForwardDiff.Partials{0, B}, ::Any, ::Any) where {N, A, B} at C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\partials.jl:142
_mul_partials(::ForwardDiff.Partials{N}, ::ForwardDiff.Partials{N}, ::Any, ::Any) where N at C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\partials.jl:118
...
Stacktrace:
[1] dual_definition_retval(#unused#::Val{ForwardDiff.Tag{var"#60#61", Float64}}, val::Float64, deriv1::Float64, partial1::ForwardDiff.Partials{1, Float64}, deriv2::Float64, partial2::ForwardDiff.Partials{2, Float64})
@ ForwardDiff C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\dual.jl:203
[2] *
@ C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\dual.jl:271 [inlined]
[3] gamma_inc(da::ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}, dx::ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}, ind::Int64)
@ Main .\Untitled-1:55
[4] gamma_inc(a::ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}, x::ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2})
@ SpecialFunctions C:\Users\lrnv\.julia\packages\SpecialFunctions\hefUc\src\gamma_inc.jl:858
[5] (::var"#60#61")(x::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}})
@ Main .\Untitled-1:60
[6] vector_mode_dual_eval!(f::var"#60#61", cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}}}, x::Vector{Float64})
@ ForwardDiff C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\apiutils.jl:37
[7] vector_mode_gradient(f::var"#60#61", x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}}})
[8] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}}}, ::Val{true})
[9] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}}}) (repeats 2 times)
[10] top-level scope
@ Untitled-1:60

julia>


I think the issue is that the four partials only have 2 values : there is only one derivative w.r.t. a or x, while da and dx have 3 values: there is two derivatives. Which is logic.

How could I modify the ∂1[1],∂1[2],∂2[1] and ∂2[2] to have one more (zero) derivative in them so that the multiplication works ?

EDIT :I tried with ReverseDiff, but no luck yet :

using SpecialFunctions, HypergeometricFunctions, ReverseDiff, ChainRulesCore
function ∂Γ_∂a(a,x)
g = gamma(a)
dg = digamma(a)
G = gamma_inc(a,x)[1]
lx = log(x)
r = pFq([a,a],[a+1,a+1],-x)
return a^(-2) * x^a * r/g + G*(dg - lx)
end
# Copied and modified the chainrule from SpecialFunctions.jl:
ChainRulesCore.@scalar_rule(
SpecialFunctions.gamma_inc(a, x, IND),
@setup(z = exp(-x) * x^(a - 1) / gamma(a)),
(
-∂Γ_∂a(a,x),
z,
ChainRulesCore.NoTangent(),
),
(
∂Γ_∂a(a,x),
-z,
ChainRulesCore.NoTangent(),
),
)

# Shananigans to call reversediff
# I cannot find the right way to call it...
# And when it looks like i manage to clal it correclty, it does not use my chainrule...
gamma_inc_arr(x) = gamma_inc.(x,2.3)

1 Like

I think I have something, looks like I missed a ulitplication by partials() in the ForwardDiff implementation :

using SpecialFunctions, HypergeometricFunctions, ForwardDiff

function SpecialFunctions.gamma_inc(d::ForwardDiff.Dual{T,<:Real}, x::Real, ind::Integer) where {T}
a = ForwardDiff.value(d)
g = gamma(a)
dg = digamma(a)
p, q = SpecialFunctions.gamma_inc(a, x, ind)
lx = log(x)
r = pFq([a,a],[a+1,a+1],-x)
dq = a^(-2) * x^a * r/g + p*(dg - lx)
∂q = dq * ForwardDiff.partials(d) ############ <<<------- I missed this multiplication.
return (ForwardDiff.Dual{T}(p, -∂q), ForwardDiff.Dual{T}(q, ∂q))
end
function SpecialFunctions.gamma_inc(da::ForwardDiff.Dual{T,<:Real}, dx::ForwardDiff.Dual{T,<:Real}, ind::Integer) where {T}
a = ForwardDiff.value(da)
x = ForwardDiff.value(dx)
∂1 = SpecialFunctions.gamma_inc(da, x, ind::Integer) # this partial is w.r.t. a
∂2 = SpecialFunctions.gamma_inc(a, dx, ind::Integer) # this partial is w.r.t. x
return (∂1[1] * da + ∂2[1] * dx,∂1[2] * da + ∂2[2] * dx)
end

# Now everything works :
ForwardDiff.derivative(x -> gamma_inc(2,x)[2],2.0)
ForwardDiff.derivative(a -> gamma_inc(a,3.0)[2],2.0)
ForwardDiff.derivative(a -> gamma_inc(a,3+a)[2],2.0)


EDIT : No, this still seems ot be wrong.

Honestly, your approach looks good to me!

For reference, note down the gradient at [2.0,1.0] approximated with finite differences:

julia> dGda = (gamma_inc(2.01,1.0)[2] - gamma_inc(1.99,1.0)[2])/(2*0.01)
0.2761963494263997

julia> dGdx = (gamma_inc(2.0,1.01)[2] - gamma_inc(2.0,0.99)[2])/(2*0.01)
-0.36787330975545096


With your ChainRules definition, and

gamma_inc_q(args...) = gamma_inc(args...)[2]
gamma_inc_q(v) = gamma_inc(v...)[2]


to extract the second component of gamma_inc, Zygote yields

julia> Zygote.gradient(x->gamma_inc_q(1.0,x), 1.0)
(-0.36787944117144233,)

(0.2761960457028353,)

julia> J = Zygote.gradient(gamma_inc_q, [2.0,1.0])[1]
[0.2761960457028353, -0.36787944117144233]


in full agreement.

Via the chain rule, \frac{d \gamma(f(a),g(a))}{d a} = \nabla\gamma \cdot (f'(a),g'(a))^t. As an example chose f(a) = 2a, g(a)=a

julia> Zygote.gradient(a->gamma_inc_q(2a,a), 1.0)
(0.18451265023422825,)

julia> dot(J, [2.0,1.0])
0.18451265023422825


The ForwardDiff rule you defined yields the same results for me. I have however defined

function SpecialFunctions.gamma_inc(a::Dual{T,<:Real,N}, x::Dual{T, <:Real,N}, ind::Integer) where {T,N}
da = SpecialFunctions.gamma_inc(a, value(x), ind)[2]
dx = SpecialFunctions.gamma_inc(value(a), x, ind)[2]

par = partials(da) + partials(dx)

Dq = Dual{T}(value(da), par)
Dp = Dual{T}(1-value(da), -par)

return [Dp,Dq]
end


to return a vector of duals in order to facilitate using jacobian which expects an array to be returned.

julia> gamma_vec(a) = gamma_inc(a...)

julia> ForwardDiff.jacobian(gamma_vec, [2.0, 1.0])
2×2 Matrix{Float64}:
-0.276196   0.367879
0.276196  -0.367879

julia> derivative(a -> gamma_inc_q(a,1.0),2.0)*2 +
derivative(x -> gamma_inc_q(2.0,x),1.0) ==
derivative(x->gamma_inc_q(2x,x), 1.0) ==
(ForwardDiff.jacobian(gamma_vec, [2.0, 1.0])*[2.0,1.0])[2]
true


Would you mind summarizing what you defined all at once ? Im getting lost a little. I’m specifically looking at the forwardiff version, for now on.

I think that my gradients are wrong, since they do not match with finite differences, and i suspect this is because i do not have the right function for gamma_inc(dual,dual)… So what did you clearly defined ?

Understandable

import ForwardDiff: Dual, derivative, jacobian, value, partials, Tag, Partials
using SpecialFunctions, HypergeometricFunctions, ForwardDiff
using Zygote
using ChainRulesCore

function Γ(a,x,ind::Integer)
return gamma_inc(a,x,ind)[2]
end
function ∂Γ_∂a(a,x,ind)
g = gamma(a)
dg = digamma(a)
G = Γ(a,x,ind)
lx = log(x)
r = pFq([a,a],[a+1,a+1],-x)
return a^(-2) * x^a * r/g + (1 - G)*(dg - lx)
end
function SpecialFunctions.gamma_inc(a::ForwardDiff.Dual{T,<:Real}, x::Real, ind::Integer) where {T}
_a = ForwardDiff.value(a)
q = Γ(_a, x, ind)
∂q = ∂Γ_∂a(_a,x,ind)*partials(a)
return (ForwardDiff.Dual{T}(1-q, -∂q), ForwardDiff.Dual{T}(q, ∂q))
end
function SpecialFunctions.gamma_inc(a::Dual{T,<:Real,N}, x::Dual{T, <:Real,N}, ind::Integer) where {T,N}
da = SpecialFunctions.gamma_inc(a, value(x), ind)[2]
dx = SpecialFunctions.gamma_inc(value(a), x, ind)[2]

par = partials(da) + partials(dx)

Dq = Dual{T}(value(da), par)
Dp = Dual{T}(1-value(da), -par)

return [Dp,Dq]
end

ChainRulesCore.@scalar_rule(
gamma_inc(a, x, IND),
@setup(z = exp(-x) * x^(a - 1) / gamma(a)),
(
-∂Γ_∂a(a,x,IND),
z,
ChainRulesCore.NoTangent(),
),
(
+∂Γ_∂a(a,x,IND),
-z,
ChainRulesCore.NoTangent(),
),
)

gamma_vec(a) = gamma_inc(a...)
gamma_inc_q(args...) = gamma_inc(args...)[2]
gamma_inc_q(v) = gamma_inc(v...)[2]


It’s pretty much identical to your code.

What I don’t see is how to make Zygote’s jacobian work with both components of gamma_inc.

1 Like

So, after a little test, this is working as it should for me. Thanks a lot, you nailed it: what I missed was the correct implementation of the cross-derivative.

Where do you think this code should go ? Partly in SpecialFunctions.jl for the ChainRule stuff, partly in ForwardDiff.jl for the ForwardDiff stuff ? So with 2 different PRs IMHO.

Ah, sorry… I wrote my own and only skimmed yours.

(∂1[1] * da + ∂2[1] * dx,∂1[2] * da + ∂2[2] * dx)


is not correct. ∂1 already multiplies with da internally: ∂Γ_∂a(_a,x,ind)*partials(a).

The ChainRules rules should go in SpecialFunctions I suppose, which then will need to depend on HypergeometricFunctions… Don’t know if the maintainers will go for that.

Same with ForwardDiff. They define dgamma_inc/dx themselves, and that could be extended, but again would necessitate another dependence.

There is also ForwardDiffChainRules to translate ChainRules frules to ForwardDiff.

I’m not a contributer to either ecosystem though, so I’d say talk to the maintainers before authoring any PRs

1 Like

HypergeometricFunctions depends on SpecialFunctions, so we might be stuck here.

Yes i’ll open issues and we’ll see. Thanks again for your help !

1 Like

For the record, a follow up there : Something specific for pFq([a,a],[a+1,a+1],-x) ? · Issue #50 · JuliaMath/HypergeometricFunctions.jl · GitHub. This hypergeometric function is taking 95% of my runtime, so I am now trying to reduce this.