Differences in ODE Solver behaviour inside and outside Adam Optimizer

Stefano_Giampiccolo · May 31, 2023, 8:35am

Good morning,
I am working with hybrid systems (universal differential equations) using the SciML ecosystem. I apologize if this is a stupid question, but I have a doubt related to a stiff problem I’m working on, which you can find in the snippet I’ve attached below.

From what I can see, the system integration concludes, albeit with an accompanying warning message, when carried out within the cost function of an ADAM optimizer (line 77 of my code). The initial integration yields the value at the final time, whereas subsequent ones register an ‘Inf’ cost. However, performing exactly the same integration operation, with identical parameters outside the optimizer (on line 88), results in failure.

Could you kindly help me understand what’s the difference between the two integrations?

Thank you in advance, Stefano

snippet_fail_integrating_outside_optim.jl (4.0 KB)

I’m using Julia v"1.9.0", with the following packages

ComponentArrays v0.13.12
DifferentialEquations v7.7.0
Lux v0.4.53
Optimization v3.14.2
OptimizationOptimisers v0.1.2
SciMLSensitivity v7.32.0
StableRNGs v1.0.0

ChrisRackauckas · June 2, 2023, 5:25pm

Using Rosenbrock23 with adjoints is a bad idea. The performance properties of the forward pass doesn’t necessary match the performance properties of the reverse pass.

That said, did you check if this was parameter dependent?

Stefano_Giampiccolo · June 2, 2023, 6:01pm

Thank you very much for your response:

Regarding Rosenbrock23 with adjoints, I appreciate your advice; I just assumed that using AD without adjoints would be much slower. Could you suggest an alternative approach that would be suitable for dealing with stiff problems?
When you mention “parameter dependent,” are you referring to whether the difference is influenced by the parameters of the neural network (i.e. the seed used)?

ChrisRackauckas · June 9, 2023, 10:09pm

TRBDF2, FBDF. They have a bit better scaling.

Yes

Stefano_Giampiccolo · June 19, 2023, 9:14am

I apologize for the delayed response, but I was out of the office. I’ve tried with a different seed (rng = StableRNG(64)) and the two integrations are still a little bit different (10^-14, it could be also something related to floating point arithmetic).
Obviously, this difference is totally fine, I just want to understand what happens when the cost function is infty, because I am facing problems in hybrid models related to the numerical stability of the solution during the optimization process.

I’ve analyzed the code using Infiltrator, and it seems to me that every iteration of the optimiser performs the integration twice. The first call occurs at line 31 of the OptimizationOptizers.jl file:

f.grad(G, θ, d...)

The second call is made on the subsequent line (line 32) with the following code:

x = f.f(θ, prob.p, d...)

The results obtained are slightly different (and the results of this second call returns are equals to the ones I get outside the optimiser).

ChrisRackauckas · June 24, 2023, 4:12pm

The second call is what’s without the gradient call. Indeed the gradient call can change the forward steps if it’s something like forward-mode. What gradient type are you using?

Topic		Replies	Views
Odd behaviour regarding Adjoint Sensitivity Analysis Numerics question , sciml , differentialequation	8	110	October 11, 2024
SciMLSensitivity.jl : adjoint_sensitivities and CVODE_BDF crashes for medium sized stiff ODE model Modelling & Simulations question	11	295	March 9, 2023
AD through DataInterpolations, ODE : Gradient calculation, parameter -> time series -> DataInterpolation General Usage zygote , sciml , differentialequation , datainterpolations	2	99	March 12, 2025
Setting up implicit solvers to beat the performance of explicit solvers Numerics diffeq , forwarddiff	10	596	August 24, 2023
Integrators with analytic derivatives w/r/t initial conditions (especially the independent variable) Numerics question , diffeq , ode , differentialequation	2	79	June 13, 2025

Differences in ODE Solver behaviour inside and outside Adam Optimizer

Related topics