Sensitivity fail on DAEs

I am trying to use DiffEq and SciML sensitivity to build NN controllers for physical systems.

Usually everything works, however when i explicitly call adjoint_sensitivities from SciML-sensitivity to get the gradient of a loss w.r.t. the parameters of the system with an algorithm for stiff equations it fails.

For ODEs I can work around it using a higher order algorithm such as Vern9, not ideal but works.
For DAEs (ODEs with Mass-matrix in my case) I don’t see a solution as such algorithms are not compatible with them.

I checked the suggestions but:

  1. Autodiff is necessary with NNs in my case.
  2. I don’t have any allocation for cache or similar
  3. Not possible.

Error:

ERROR: First call to automatic differentiation for the Jacobian
failed. This means that the user `f` function is not compatible
with automatic differentiation. Methods to fix this include:

1. Turn off automatic differentiation (e.g. Rosenbrock23() becomes
   Rosenbrock23(autodiff=false)). More details can befound at
   https://docs.sciml.ai/DiffEqDocs/stable/features/performance_overloads/
2. Improving the compatibility of `f` with ForwardDiff.jl automatic 
   differentiation (using tools like PreallocationTools.jl). More details
   can be found at https://docs.sciml.ai/DiffEqDocs/stable/basics/faq/#Autodifferentiation-and-Dual-Numbers
3. Defining analytical Jacobians. More details can be
   found at https://docs.sciml.ai/DiffEqDocs/stable/types/ode_types/#SciMLBase.ODEFunction

Note: turning off automatic differentiation tends to have a very minimal
performance impact (for this use case, because it's forward mode for a
square Jacobian. This is different from optimization gradient scenarios).
However, one should be careful as some methods are more sensitive to
accurate gradients than others. Specifically, Rodas methods like `Rodas4`
and `Rodas5P` require accurate Jacobians in order to have good convergence,
while many other methods like BDF (`QNDF`, `FBDF`), SDIRK (`KenCarp4`),
and Rosenbrock-W (`Rosenbrock23`) do not. Thus if using an algorithm which
is sensitive to autodiff and solving at a low tolerance, please change the
algorithm as well.

TypeError: in typeassert, expected Float64, got a value of type ForwardDiff.Dual{Nothing, Float64, 12}

As a MWE the example from the documentation can be used:

using DifferentialEquations
using SciMLSensitivity

function rober(du, u, p, t)
    y₁, y₂, y₃ = u
    k₁, k₂, k₃ = p
    du[1] = -k₁ * y₁ + k₃ * y₂ * y₃
    du[2] = k₁ * y₁ - k₃ * y₂ * y₃ - k₂ * y₂^2
    du[3] = y₁ + y₂ + y₃ - 1
    nothing
end
M = [1.0 0 0
     0 1.0 0
     0 0 0]
f = ODEFunction(rober, mass_matrix = M)
prob_mm = ODEProblem(f, [1.0, 0.0, 0.0], (0.0, 1e5), [0.04, 3e7, 1e4])
sol = solve(prob_mm, Rodas5(), reltol = 1e-8, abstol = 1e-8)

sensealg = InterpolatingAdjoint(autojacvec=ZygoteVJP(), checkpointing=true)
_, dp = adjoint_sensitivities(sol, Rosenbrock23(); sensealg=sensealg, g=(u, p, t)->sum(u));

Is there an algorithm that works here?
Any idea what’s going on? Is sensitivity impossibile for DAE at the moment?

DifferentialEquations version = “7.7.0”
SciMLSensitivity version = “7.20.0”

It’s definitely possible. That exact problem is in the test suite right here:

What’s going on with your case is curious. The following works:

_, dp = adjoint_sensitivities(sol, Rosenbrock23(autodiff=false); sensealg=sensealg, g=(u, p, t)->sum(u));

which of course is using autodiff everywhere except for the Jacobian calculation of the stuff ODE solver (so it’s still adjoints and vjps) and that works fine. The test case doesn’t need to do that though, I’ll dig into what’s going on here.

Thanks Chris!

I noticed it would work without autodiff, however I found out it was too slow for my application on NNs. Do you have any other suggestion?

Tell me if I can help you in anyway with this issue!

That shouldn’t effect performance. This is forward-mode autodiff of the Jacobians in the ODE solve, not the adjoint method or the vjp calculation. So autodiff=false should be around 2x of the autodiff one.

I’m turning this into a test case as we speak. A working version is:

sensealg = InterpolatingAdjoint(autojacvec=ZygoteVJP())
_, dp = adjoint_sensitivities(sol, Rosenbrock23(); sensealg=sensealg, g=(u, p, t)->sum(u), dgdu_continuous = (du,u,p,t)->du.=1);

Basically, it seems to be an interaction between stiff ODE solvers with autodiff on, not supplying the dg derivative, and checkpointing. In this piece I turned off checkpointing and supplied dgdu_continuous and it works. I have a fix for the checkpointing part and will put a patch in when I figure out why not supplying dgdu_continuous is related, but this should be enough for you to test the real impact on your code.

For performance gains, I would suspect you wouldn’t want to use Rosenbrock methods for the adjoint. Things like TRBDF2 will be a lot faster in the backpass, regardless of the forward pass’s ODE solver choice.

In my use-case it’s weird: when I tried without autodiff I got no error but the process hangs indefinitely (so many hours i had to kill it), that’s why I assumed it wasn’t feasible with a neural network.

Now i tried with QNDF(autodiff=false) and it works even though sometime i get a warning because it’s a very stiff problem. It seems that only multistep methods succeed, while Rosenbrock’s hang.

I also tried on the MWE your working version:

sensealg = InterpolatingAdjoint(autojacvec=ZygoteVJP())
_, dp = adjoint_sensitivities(sol, Rosenbrock23(); sensealg=sensealg, g=(u, p, t)->sum(u), dgdu_continuous = (du,u,p,t)->du.=1);

and it doesn’t give errors.
However if I change sum to prod with dgdu_continuous = (du,u,p,t)->du.= [u[2]*u[3], u[1]*u[3], u[1]*u[2]] it gives me the usual Jacobian error. I switch to prod because it failed on my loss as well where I defined dgdu_continuous with a wrapper around Zygote.gradient. However it seems to be more profound and not related to Zygote.

I’ll continue with QNDF(autodiff=false) until your fix lands, i have no idea what’s going on here.
Thanks for everything.