How to tell ForwardDiff what i want it to do?

Hey,

This is a follow up to this previous post where I was looking for the right derivatives of the gamma_inc function. I found them. Now i am trying to tell ForwardDiff to use them… with issues.

The function has two arguments a and x. The derivtive w.r.t. x is already in ForwardDiff:

ForwardDiff.derivative(x -> gamma_inc(2,x)[2],2.0) # Works

I just added the derivative w.r.t a:

using SpecialFunctions, HypergeometricFunctions, ForwardDiff
function Γ(a,x)
    return gamma_inc(a,x)[2]
end
function ∂Γ_∂a(a,x)
    g = gamma(a)
    dg = digamma(a)
    G = Γ(a,x)
    lx = log(x)
    r = pFq([a,a],[a+1,a+1],-x)
    return a^(-2) * x^a * r/g + (1 - G)*(dg - lx)
end
function SpecialFunctions.gamma_inc(d::ForwardDiff.Dual{T,<:Real}, x::Real, ind::Integer) where {T}
    a = ForwardDiff.value(d)
    p, q = SpecialFunctions.gamma_inc(a, x, ind)
    ∂q = ∂Γ_∂a(a,x)
    return (ForwardDiff.Dual{T}(p, -∂q), ForwardDiff.Dual{T}(q, ∂q))
end
ForwardDiff.derivative(a -> gamma_inc(a,3.0)[2],2.0) # Works 

But somehow, forwardiff does not want to pass dual numbers through both parameters :

ForwardDiff.derivative(a -> gamma_inc(a,3+a)[2],2.0) # Doesnt work. 

Fair enough. Remembering my 101 lectures on the chain rule, I craft f(dual,dual) from f(dual,x) and f(a,dual) as follows:

function SpecialFunctions.gamma_inc(da::ForwardDiff.Dual{T,<:Real}, dx::ForwardDiff.Dual{T,<:Real}, ind::Integer) where {T}
    a = ForwardDiff.value(da)
    x = ForwardDiff.value(dx)
    ∂1 = SpecialFunctions.gamma_inc(da, x, ind::Integer)
    ∂2 = SpecialFunctions.gamma_inc(a, dx, ind::Integer)
    return ∂1 .* da .+ ∂2 .* dx
end

And now it works :

ForwardDiff.derivative(a -> gamma_inc(a,3+a)[2],2.0) # Works ! 

Youpi ! So i go back to what i was doing, trying to use it… but no :

ForwardDiff.gradient(x -> gamma_inc(x[1]+x[2],x[1]-x[2])[2],[3.0,1.0]) # Still does not work... 

I think that the derivatives i added should be enough to compute this gradient ! I do not understand what I missed. Could someone help?

From the stack trace, I think it may come from the line ∂1 .* da .+ ∂2 .* dx in my return statement, where, in case we are taking gradients w.r.t. several variables, the multiplications / additions do not work correctly; Maybe i should re-write this somehow differently ?


EDIT :

If I change the function to the following to see things :

function SpecialFunctions.gamma_inc(da::ForwardDiff.Dual{T,<:Real}, dx::ForwardDiff.Dual{T,<:Real}, ind::Integer) where {T}
    a = ForwardDiff.value(da)
    x = ForwardDiff.value(dx)
    ∂1 = SpecialFunctions.gamma_inc(da, x, ind::Integer)
    ∂2 = SpecialFunctions.gamma_inc(a, dx, ind::Integer)
    @show da
    @show dx
    @show ∂1[1]
    @show ∂1[1]
    @show ∂2[1]
    @show ∂2[2]
    return (∂1[1] * da + ∂2[1] * dx,∂1[2] * da + ∂2[2] * dx)
end

Then the stacktrace I Have looks like :

julia> ForwardDiff.gradient(x -> gamma_inc(x[1]+x[2],x[1]-x[2])[2],[3.0,1.0]) # Still does not work... 
da = Dual{ForwardDiff.Tag{var"#60#61", Float64}}(4.0,1.0,1.0)
dx = Dual{ForwardDiff.Tag{var"#60#61", Float64}}(2.0,1.0,-1.0)
∂1[1] = Dual{ForwardDiff.Tag{var"#60#61", Float64}}(0.1428765395014529,-0.1302623114706677)
∂1[1] = Dual{ForwardDiff.Tag{var"#60#61", Float64}}(0.1428765395014529,-0.1302623114706677)
∂2[1] = Dual{ForwardDiff.Tag{var"#60#61", Float64}}(0.1428765395014529,0.1804470443154836,-0.1804470443154836)
∂2[2] = Dual{ForwardDiff.Tag{var"#60#61", Float64}}(0.8571234604985472,-0.1804470443154836,0.1804470443154836)
ERROR: MethodError: no method matching _mul_partials(::ForwardDiff.Partials{1, Float64}, ::ForwardDiff.Partials{2, Float64}, ::Float64, ::Float64)
Closest candidates are:
  _mul_partials(::ForwardDiff.Partials{0, A}, ::ForwardDiff.Partials{N, B}, ::Any, ::Any) where {N, A, B} at C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\partials.jl:141  
  _mul_partials(::ForwardDiff.Partials{N, A}, ::ForwardDiff.Partials{0, B}, ::Any, ::Any) where {N, A, B} at C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\partials.jl:142  
  _mul_partials(::ForwardDiff.Partials{N}, ::ForwardDiff.Partials{N}, ::Any, ::Any) where N at C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\partials.jl:118
  ...
Stacktrace:
  [1] dual_definition_retval(#unused#::Val{ForwardDiff.Tag{var"#60#61", Float64}}, val::Float64, deriv1::Float64, partial1::ForwardDiff.Partials{1, Float64}, deriv2::Float64, partial2::ForwardDiff.Partials{2, Float64})
    @ ForwardDiff C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\dual.jl:203
  [2] *
    @ C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\dual.jl:271 [inlined]
  [3] gamma_inc(da::ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}, dx::ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}, ind::Int64)     
    @ Main .\Untitled-1:55
  [4] gamma_inc(a::ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}, x::ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2})
    @ SpecialFunctions C:\Users\lrnv\.julia\packages\SpecialFunctions\hefUc\src\gamma_inc.jl:858
  [5] (::var"#60#61")(x::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}})
    @ Main .\Untitled-1:60
  [6] vector_mode_dual_eval!(f::var"#60#61", cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}}}, x::Vector{Float64})
    @ ForwardDiff C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\apiutils.jl:37
  [7] vector_mode_gradient(f::var"#60#61", x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}}})
    @ ForwardDiff C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\gradient.jl:106
  [8] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}}}, ::Val{true})
    @ ForwardDiff C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\gradient.jl:19
  [9] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var"#60#61", Float64}, Float64, 2}}}) (repeats 2 times)
    @ ForwardDiff C:\Users\lrnv\.julia\packages\ForwardDiff\QdStj\src\gradient.jl:17
 [10] top-level scope
    @ Untitled-1:60

julia> 

I think the issue is that the four partials only have 2 values : there is only one derivative w.r.t. a or x, while da and dx have 3 values: there is two derivatives. Which is logic.

How could I modify the ∂1[1],∂1[2],∂2[1] and ∂2[2] to have one more (zero) derivative in them so that the multiplication works ?


EDIT :I tried with ReverseDiff, but no luck yet :

using SpecialFunctions, HypergeometricFunctions, ReverseDiff, ChainRulesCore
function ∂Γ_∂a(a,x)
    g = gamma(a)
    dg = digamma(a)
    G = gamma_inc(a,x)[1]
    lx = log(x)
    r = pFq([a,a],[a+1,a+1],-x)
    return a^(-2) * x^a * r/g + G*(dg - lx)
end
# Copied and modified the chainrule from SpecialFunctions.jl: 
ChainRulesCore.@scalar_rule(
    SpecialFunctions.gamma_inc(a, x, IND),
    @setup(z = exp(-x) * x^(a - 1) / gamma(a)),
    (
        -∂Γ_∂a(a,x),
        z,
        ChainRulesCore.NoTangent(),
    ),
    (
        ∂Γ_∂a(a,x),
        -z,
        ChainRulesCore.NoTangent(),
    ),
)

# Shananigans to call reversediff
# I cannot find the right way to call it... 
# And when it looks like i manage to clal it correclty, it does not use my chainrule...
gamma_inc_arr(x) = gamma_inc.(x,2.3)
ReverseDiff.gradient(gamma_inc_arr, ([1.0],))
1 Like

I think I have something, looks like I missed a ulitplication by partials() in the ForwardDiff implementation :

using SpecialFunctions, HypergeometricFunctions, ForwardDiff

function SpecialFunctions.gamma_inc(d::ForwardDiff.Dual{T,<:Real}, x::Real, ind::Integer) where {T}
    a = ForwardDiff.value(d)
    g = gamma(a)
    dg = digamma(a)
    p, q = SpecialFunctions.gamma_inc(a, x, ind)
    lx = log(x)
    r = pFq([a,a],[a+1,a+1],-x)
    dq = a^(-2) * x^a * r/g + p*(dg - lx)
    ∂q = dq * ForwardDiff.partials(d) ############ <<<------- I missed this multiplication. 
    return (ForwardDiff.Dual{T}(p, -∂q), ForwardDiff.Dual{T}(q, ∂q))
end
function SpecialFunctions.gamma_inc(da::ForwardDiff.Dual{T,<:Real}, dx::ForwardDiff.Dual{T,<:Real}, ind::Integer) where {T}
    a = ForwardDiff.value(da)
    x = ForwardDiff.value(dx)
    ∂1 = SpecialFunctions.gamma_inc(da, x, ind::Integer) # this partial is w.r.t. a
    ∂2 = SpecialFunctions.gamma_inc(a, dx, ind::Integer) # this partial is w.r.t. x
    return (∂1[1] * da + ∂2[1] * dx,∂1[2] * da + ∂2[2] * dx)
end

# Now everything works : 
ForwardDiff.derivative(x -> gamma_inc(2,x)[2],2.0)
ForwardDiff.derivative(a -> gamma_inc(a,3.0)[2],2.0)
ForwardDiff.derivative(a -> gamma_inc(a,3+a)[2],2.0) 
ForwardDiff.gradient(x -> gamma_inc(x[1]+x[2],x[1]-x[2])[2],[3.0,1.0])

EDIT : No, this still seems ot be wrong.

Honestly, your approach looks good to me!

For reference, note down the gradient at [2.0,1.0] approximated with finite differences:

julia> dGda = (gamma_inc(2.01,1.0)[2] - gamma_inc(1.99,1.0)[2])/(2*0.01)
0.2761963494263997

julia> dGdx = (gamma_inc(2.0,1.01)[2] - gamma_inc(2.0,0.99)[2])/(2*0.01)
-0.36787330975545096

With your ChainRules definition, and

gamma_inc_q(args...) = gamma_inc(args...)[2]
gamma_inc_q(v) = gamma_inc(v...)[2]

to extract the second component of gamma_inc, Zygote yields

julia> Zygote.gradient(x->gamma_inc_q(1.0,x), 1.0)
(-0.36787944117144233,)

julia> Zygote.gradient(a->gamma_inc_q(a,1.0), 2.0)
(0.2761960457028353,)

julia> J = Zygote.gradient(gamma_inc_q, [2.0,1.0])[1]
[0.2761960457028353, -0.36787944117144233]

in full agreement.

Via the chain rule, \frac{d \gamma(f(a),g(a))}{d a} = \nabla\gamma \cdot (f'(a),g'(a))^t. As an example chose f(a) = 2a, g(a)=a

julia> Zygote.gradient(a->gamma_inc_q(2a,a), 1.0)
(0.18451265023422825,)

julia> dot(J, [2.0,1.0])
0.18451265023422825

The ForwardDiff rule you defined yields the same results for me. I have however defined

function SpecialFunctions.gamma_inc(a::Dual{T,<:Real,N}, x::Dual{T, <:Real,N}, ind::Integer) where {T,N}
    da = SpecialFunctions.gamma_inc(a, value(x), ind)[2]
    dx = SpecialFunctions.gamma_inc(value(a), x, ind)[2]

    par = partials(da) + partials(dx)

    Dq = Dual{T}(value(da), par)
    Dp = Dual{T}(1-value(da), -par)

    return [Dp,Dq]
end

to return a vector of duals in order to facilitate using jacobian which expects an array to be returned.

julia> gamma_vec(a) = gamma_inc(a...)

julia> ForwardDiff.jacobian(gamma_vec, [2.0, 1.0])
2×2 Matrix{Float64}:
 -0.276196   0.367879
  0.276196  -0.367879
julia> derivative(a -> gamma_inc_q(a,1.0),2.0)*2 +
         derivative(x -> gamma_inc_q(2.0,x),1.0) ==
       derivative(x->gamma_inc_q(2x,x), 1.0) ==
       (ForwardDiff.jacobian(gamma_vec, [2.0, 1.0])*[2.0,1.0])[2]
true

Would you mind summarizing what you defined all at once ? Im getting lost a little. I’m specifically looking at the forwardiff version, for now on.

I think that my gradients are wrong, since they do not match with finite differences, and i suspect this is because i do not have the right function for gamma_inc(dual,dual)… So what did you clearly defined ?

Understandable :smile:

import ForwardDiff: Dual, derivative, jacobian, value, partials, Tag, Partials
using SpecialFunctions, HypergeometricFunctions, ForwardDiff
using Zygote
using ChainRulesCore

function Γ(a,x,ind::Integer)
    return gamma_inc(a,x,ind)[2]
end
function ∂Γ_∂a(a,x,ind)
    g = gamma(a)
    dg = digamma(a)
    G = Γ(a,x,ind)
    lx = log(x)
    r = pFq([a,a],[a+1,a+1],-x)
    return a^(-2) * x^a * r/g + (1 - G)*(dg - lx)
end
function SpecialFunctions.gamma_inc(a::ForwardDiff.Dual{T,<:Real}, x::Real, ind::Integer) where {T}
    _a = ForwardDiff.value(a)
    q = Γ(_a, x, ind)
    ∂q = ∂Γ_∂a(_a,x,ind)*partials(a)
    return (ForwardDiff.Dual{T}(1-q, -∂q), ForwardDiff.Dual{T}(q, ∂q))
end
function SpecialFunctions.gamma_inc(a::Dual{T,<:Real,N}, x::Dual{T, <:Real,N}, ind::Integer) where {T,N}
    da = SpecialFunctions.gamma_inc(a, value(x), ind)[2]
    dx = SpecialFunctions.gamma_inc(value(a), x, ind)[2]

    par = partials(da) + partials(dx)

    Dq = Dual{T}(value(da), par)
    Dp = Dual{T}(1-value(da), -par)

    return [Dp,Dq]
end

ChainRulesCore.@scalar_rule(
    gamma_inc(a, x, IND),
    @setup(z = exp(-x) * x^(a - 1) / gamma(a)),
    (
        -∂Γ_∂a(a,x,IND),
        z,
        ChainRulesCore.NoTangent(),
    ),
    (
        +∂Γ_∂a(a,x,IND),
        -z,
        ChainRulesCore.NoTangent(),
    ),
)

gamma_vec(a) = gamma_inc(a...)
gamma_inc_q(args...) = gamma_inc(args...)[2]
gamma_inc_q(v) = gamma_inc(v...)[2]

It’s pretty much identical to your code.

What I don’t see is how to make Zygote’s jacobian work with both components of gamma_inc.

1 Like

So, after a little test, this is working as it should for me. Thanks a lot, you nailed it: what I missed was the correct implementation of the cross-derivative.

Where do you think this code should go ? Partly in SpecialFunctions.jl for the ChainRule stuff, partly in ForwardDiff.jl for the ForwardDiff stuff ? So with 2 different PRs IMHO.

Ah, sorry… I wrote my own and only skimmed yours.

(∂1[1] * da + ∂2[1] * dx,∂1[2] * da + ∂2[2] * dx)

is not correct. ∂1 already multiplies with da internally: ∂Γ_∂a(_a,x,ind)*partials(a).

The ChainRules rules should go in SpecialFunctions I suppose, which then will need to depend on HypergeometricFunctions… Don’t know if the maintainers will go for that.

Same with ForwardDiff. They define dgamma_inc/dx themselves, and that could be extended, but again would necessitate another dependence.

There is also ForwardDiffChainRules to translate ChainRules frules to ForwardDiff.

I’m not a contributer to either ecosystem though, so I’d say talk to the maintainers before authoring any PRs :slight_smile:

1 Like

HypergeometricFunctions depends on SpecialFunctions, so we might be stuck here.

Yes i’ll open issues and we’ll see. Thanks again for your help !

1 Like

For the record, a follow up there : Something specific for pFq([a,a],[a+1,a+1],-x) ? · Issue #50 · JuliaMath/HypergeometricFunctions.jl · GitHub. This hypergeometric function is taking 95% of my runtime, so I am now trying to reduce this.