ForwardDiff.jacobian failed to differentiate a code and gives a wrong result

Dear Julia users,
I have a complex ODE systems in which I use Gaussian overlaps, and I obtain wrong results when I estimate the Jaccobian of my ODE system using ForwardDiff.jacobian. I get a diagonal matrix, with zero for all values out of the diagonal, which is obviously wrong.
Here is a reproducible example:

using ForwardDiff
using Distributions
using QuadGK
using OrdinaryDiffEq

ns=10
epsilon= 0.1
p = epsilon,ns

function LV_model2(u,p,t)
    epsilon,ns = p

    du=similar(u)

    #loop to define derivatives
    for i in 1:ns

        #partial derivatives that will be use to calculate the derivative
        function pop_derivative_p(x)
            overlaps = Vector{}(undef, ns)
            for v in 1:ns
                mu1=u[v]
                function inte(theta)
                    t1= pdf.(Normal.(x,0.1),theta)
                    t2= pdf.(Normal.(mu1,0.1),theta)
                    return(min(t1,t2))
                end
                overlaps[v] = quadgk(inte, -10, 10, rtol=1e-6)[1] #this is the part that is the problem, if I use inte(x) instead it seems it works
            end
            d_pop= 1 .- (sum(overlaps)) 
            return d_pop
        end
        ∂f_∂x_p(x) = ForwardDiff.derivative(x->pop_derivative_p(x),x)

        #### Final derivatives
        du[i]=  epsilon * ∂f_∂x_p(u[i])

    end
    return du
end

uinit=rand(ns)
jacob=ForwardDiff.jacobian(u -> LV_model2(u,p,nothing),uinit)

Here is the jacobian I get:

It seems that the problem comes from the calculation of overlaps, as if I simply replace overlaps[v] = quadgk(inte, -10, 10, rtol=1e-6)[1] by overlaps[v] = inte(x) I obtain a Jacobian that makes sense.
I tried to calculate the overlap manually, using sum(inte.([-10:0.01:10;]))*0.01 but I get the same problem as when using quadgk

I don’t understand what causes this wrong result. Did you already encounter that problem? Would you have a solution?

Your integration state needs to be dual valued to propagate the derivatives. Try quadgk(inte, -10*one(x), 10*one(x), rtol=1e-6)[1]

Thanks for the answer.

quadgk(inte, -10*one(1), 10*one(1), rtol=1e-6)[1] did not change the result, but I figured out that the problem was the min function used to calculate the integral. I should have think about that before. If I change the minimum to a product return(t1 * t2) then it works perfectly. But I need this minimum. Anyone with an idea how to make ForwardDiff handling minimum function ?

I find it quite problematic that the function fails to differentiate the code properly but that there is no warning associated with output

I’m not sure I understand your issue here. If you’re surprised by the fact that derivatives are “stopped” by the min function, this is not necessarily wrong. Depending on where you evaluate that function, it chooses its first or its second argument and discards the other. So results like the following are in fact correct:

julia> using ForwardDiff

julia> f(x) = min(x, 1)
f (generic function with 1 method)

julia> ForwardDiff.derivative(f, 0.5)
1.0

julia> ForwardDiff.derivative(f, 1.5)
0.0

Does this explain the behavior you’re encountering?

Thanks.
I am not sure I understand really what happens here.
In my case ForwardDiff.derivative actually succeed to find the good derivative:

julia> function f(x)
           function inte(theta)
               t1= pdf.(Normal.(x,0.1),theta)
               t2= pdf.(Normal.(0,0.1),theta)
               return(min(t1,t2))
           end
           quadgk(inte, -10, 10, rtol=1e-6)[1]
        end
f (generic function with 1 method)

julia> 

julia> ForwardDiff.derivative(f, 0.5)
-0.17515563818429902

julia> (f(0.5+0.0001)-f(0.5))/0.0001 #numerical derivative
-0.17522585354640477

But problems arise when using ForwardDiff.jacobian on the overall ODE system, then it get blocked by the min function. What would explain that ForwardDiff.derivative handles min but not ForwardDiff.jacobian ?

For example, if I take the original system of that post:

using ForwardDiff
using Distributions
using QuadGK
using OrdinaryDiffEq

ns=10
epsilon= 0.1
p = epsilon,ns

function LV_model2(u,p,t)
    epsilon,ns = p

    du=similar(u)

    #loop to define derivatives
    for i in 1:ns

        #partial derivatives that will be use to calculate the derivative
        function pop_derivative_p(x)
            overlaps = Vector{}(undef, ns)
            for v in 1:ns
                mu1=u[v]
                function inte(theta)
                    t1= pdf.(Normal.(x,0.1),theta)
                    t2= pdf.(Normal.(mu1,0.1),theta)
                    return(min(t1,t2))
                end
                overlaps[v] = quadgk(inte, -10*one(1), 10*one(1), rtol=1e-6)[1] #this is the part that is the problem, if I use inte(x) instead it seems it works
            end
            d_pop= 1 .- (sum(overlaps)) 
            return d_pop
        end
        ∂f_∂x_p(x) = ForwardDiff.derivative(x->pop_derivative_p(x),x)

        #### Final derivatives
        du[i]=  epsilon * ∂f_∂x_p(u[i])

    end
    return du
end

uinit=rand(ns)

If I calculate the jacobian of that system with automatic differentiation I get that:

julia> jacob=ForwardDiff.jacobian(u -> LV_model2(u,p,nothing),uinit)
10×10 Matrix{Float64}:
 -11.1521   -0.0     -0.0      -0.0      -0.0       -0.0      -0.0      -0.0      -0.0      -0.0
  -0.0     -11.7026  -0.0      -0.0      -0.0       -0.0      -0.0      -0.0      -0.0      -0.0
  -0.0      -0.0     -7.83545  -0.0      -0.0       -0.0      -0.0      -0.0      -0.0      -0.0
  -0.0      -0.0     -0.0      -3.73824  -0.0       -0.0      -0.0      -0.0      -0.0      -0.0
  -0.0      -0.0     -0.0      -0.0      -3.92077   -0.0      -0.0      -0.0      -0.0      -0.0
  -0.0      -0.0     -0.0      -0.0      -0.0      -11.9675   -0.0      -0.0      -0.0      -0.0
  -0.0      -0.0     -0.0      -0.0      -0.0       -0.0     -11.6624   -0.0      -0.0      -0.0
  -0.0      -0.0     -0.0      -0.0      -0.0       -0.0      -0.0     -15.0114   -0.0      -0.0
  -0.0      -0.0     -0.0      -0.0      -0.0       -0.0      -0.0      -0.0     -11.1676   -0.0
  -0.0      -0.0     -0.0      -0.0      -0.0       -0.0      -0.0      -0.0      -0.0     -13.4511

While if I do it numerically I get that:

julia> function make_jacobian(f,u,perturb,p)
           refdy=copy(f(u,p,nothing))
           ns=size(u)[1]
           delta=max.(u*perturb,perturb)
           jacob=Matrix{Real}(undef,ns,ns)
           for i in 1:ns
               u2=copy(u)
               u2[i]=u2[i]+delta[i]
               dynew=copy(f(u2,p,nothing))
               jacob[:,i]=(dynew-refdy)/delta[i]
           end
           return(jacob)
       end
make_jacobian (generic function with 1 method)

julia> 

julia> jacob = make_jacobian(LV_model2,uinit,1e-04,p)
10×10 Matrix{Real}:
 -6.88869    1.35461     0.219163     0.264666     0.205828     0.0         0.0         1.77198      0.463356   -4.03489e-7
  1.35551   -7.51865     0.0          0.0380899    0.046985     1.67476     0.0         0.0          1.56606    -0.496306
  0.21901    0.0        -5.95829      9.11643e-5   6.81269e-5   0.595506    0.0         0.00666408   0.250213    0.807209
  0.26493    0.0381576   9.13327e-5  -3.12883      0.0          0.167864    0.0394609  -0.171674     0.2606      0.0111917
  0.206057   0.0470992   6.82685e-5   0.0         -1.80396      0.282871    0.0575258   1.21243      0.30533     0.00372056
  0.0        1.67382     0.596141     0.167538     0.282762    -1.50435     0.528586    3.01909      1.92635     2.25779
  0.0        0.0         0.0          0.0394253    0.0575153    0.528911  -10.5967      0.0          0.441431    0.0
  1.77248    0.0         0.00666965  -0.170552     1.21221      3.01907     0.0        -6.91513      1.80268     0.453069
  0.415065   1.56529     0.250241     0.260226     0.304755     1.91738     0.434085    1.80238     -4.21366     0.0
  0.0       -0.508036    0.807792     0.0111686    0.00373992   2.25179     0.0         0.452824     0.0       -10.4279

As a rule of thumb, I would trust ForwardDiff’s result more than any handwritten derivative code, even if I had written that code myself. And indeed, when I compare with an existing package doing finite differences, I find the same Jacobian.

I used the opportunity to slightly generalize your code, let me know if you have questions. In particular, I have removed type-unstable initializations like Vector{}, instead trying to reproduce the input types whenever I create new storage.

using ForwardDiff: ForwardDiff
using FiniteDiff: FiniteDiff
using DifferentiationInterface
using Distributions
using QuadGK

function LV_model2(u, p=0.1, t=nothing)
    du = similar(u)
    for i in eachindex(u)
        function pop_derivative_p(x)
            overlaps = similar(u, promote_type(eltype(u), typeof(x)))
            for v in eachindex(u)
                mu1 = u[v]
                function inte(theta)
                    t1 = pdf.(Normal.(x, 0.1), theta)
                    t2 = pdf.(Normal.(mu1, 0.1), theta)
                    return min(t1, t2)
                end
                overlaps[v] = quadgk(inte, -10, 10; rtol=1e-6)[1]
            end
            d_pop = 1 - sum(overlaps)
            return d_pop
        end
        ∂f_∂x_p(x) = ForwardDiff.derivative(pop_derivative_p, x)
        du[i] = p * ∂f_∂x_p(u[i])
    end
    return du
end
julia> uinit = rand(10)
10-element Vector{Float64}:
 0.810224010674785
 0.2443960215115959
 0.43526339303965733
 0.666183698303422
 0.021888909945104063
 0.6619612748191114
 0.9652715071510141
 0.9139620854643283
 0.8344151120413347
 0.5246932895063013

julia> jacobian(LV_model2, AutoForwardDiff(), uinit)
10×10 Matrix{Float64}:
 -12.6076  -0.0       -0.0     -0.0     -0.0       -0.0      -0.0      -0.0      -0.0      -0.0
  -0.0     -9.18058   -0.0     -0.0     -0.0       -0.0      -0.0      -0.0      -0.0      -0.0
  -0.0     -0.0      -12.979   -0.0     -0.0       -0.0      -0.0      -0.0      -0.0      -0.0
  -0.0     -0.0       -0.0    -14.4439  -0.0       -0.0      -0.0      -0.0      -0.0      -0.0
  -0.0     -0.0       -0.0     -0.0     -3.94823   -0.0      -0.0      -0.0      -0.0      -0.0
  -0.0     -0.0       -0.0     -0.0     -0.0      -14.4511   -0.0      -0.0      -0.0      -0.0
  -0.0     -0.0       -0.0     -0.0     -0.0       -0.0     -10.3657   -0.0      -0.0      -0.0
  -0.0     -0.0       -0.0     -0.0     -0.0       -0.0      -0.0     -10.5966   -0.0      -0.0
  -0.0     -0.0       -0.0     -0.0     -0.0       -0.0      -0.0      -0.0     -11.8795   -0.0
  -0.0     -0.0       -0.0     -0.0     -0.0       -0.0      -0.0      -0.0      -0.0     -14.3645

julia> jacobian(LV_model2, AutoFiniteDiff(), uinit)
10×10 Matrix{Float64}:
 -12.6076   0.0        0.0      0.0      0.0        0.0       0.0       0.0       0.0       0.0
   0.0     -9.18058    0.0      0.0      0.0        0.0       0.0       0.0       0.0       0.0
   0.0      0.0      -12.979    0.0      0.0        0.0       0.0       0.0       0.0       0.0
   0.0      0.0        0.0    -14.4439   0.0        0.0       0.0       0.0       0.0       0.0
   0.0      0.0        0.0      0.0     -3.94823    0.0       0.0       0.0       0.0       0.0
   0.0      0.0        0.0      0.0      0.0      -14.4511    0.0       0.0       0.0       0.0
   0.0      0.0        0.0      0.0      0.0        0.0     -10.3657    0.0       0.0       0.0
   0.0      0.0        0.0      0.0      0.0        0.0       0.0     -10.5966    0.0       0.0
   0.0      0.0        0.0      0.0      0.0        0.0       0.0       0.0     -11.8795    0.0
   0.0      0.0        0.0      0.0      0.0        0.0       0.0       0.0       0.0     -14.3645

My best guess is that your own version of the finite-differences Jacobian is wrong because of this line

Where did you get this formula from?

Thanks a lot, amazing to get some cleaner example, it is super helpful.

I translated a R function into julia (rootSolve/R/jacobian.full.R at master · cran/rootSolve · GitHub). I think the authors included that line to have a perturbation that is proportional to the variables values, in order to make sure the perturbation is significant enough to assess its effect. It is not a classic trick indeed.

Thanks for your answer, I think now I got why I get those zeros in the Jacobian.
It is true that the numerical Jacobian matrix depends on the size of the perturbation. If I decrease it to very small values, I also end up with the same jacobian as FiniteDiff and ForwardDiff.jacobian, which does not make sense analytically. Actually these zeros are due to the lack of precision of the integral quadgk(inte, -10, 10; rtol=1e-6)[1], small perturbations do not propagate through this function, because its tolerance is higher than the effect of the perturbation. Unfortunately, lowering the tolerance of the integral also increases the computational costs.

Then are you sure you want to compute the integral with numerical quadrature? What about Monte-Carlo?