How to debug failing DifferentialEquations.solve when called with Dual numbers

I try to debug why Turing is not moving forward when exploring the posterior sensitivity of parameters and initial states of a ModelingToolkit ODE problem. I reduced the problem to sample no initial conditions and a single parameter.

I am on a @infiltrate inside a Turing generated model sampled with NUTS starting from parameters that I got from a previous optimization. I used Turing.setadbackend(:forwarddiff). After being called a few times with parameters of type Float64 which yield a successful solution, it is called with parameters of type Dual.

infil> probo.p[6]
Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(0.30038926213369543,0.30038926213369543)

At this point the solver quits with complaining about minimum time step.

sol = solve(probo, Tsit5(), maxiters = 1e4);
sol.retcode
:DtLessThanMin

And sol contains only time zero and the initial state.

While, when extracting the values from the Duals, the solver works:

probo2 = remake(probo, p = ForwardDiff.value.(probo.p), u0 = ForwardDiff.value.(probo.u0), tspan=ForwardDiff.value.(probo.tspan))
sol2 = solve(probo, Tsit5(), maxiters = 1e4);
sol2.retcode
:Success

For me its very difficult to debug the problem, because the ODE function is generated by ModellingToolkit (from several extended components) and the Log-Likelihood function is generated by Turing. This also makes it difficult to extract a minimal example.

How can I get to the cause of problem and make the analysis work?

I tried solving the Dual problem with relaxed solver accuracy and there occur NaNs in the solution:

sol3 = solve(probo, maxiters = 1e4, force_dtmin=true);
so3l[end]
12-element Vector{Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}, Float64, 1}}: Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(15.821988183339558,NaN) Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(0.13080621567221076,NaN) Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(1405.5348359327868,NaN) Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(0.26598009874959616,NaN)
 Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(0.09081132939953664,NaN) Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(280.5565654185805,NaN) Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(170.07104810349725,NaN) Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(0.023325496978244327,NaN) Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(0.01,0.0) Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(10084.880861955302,NaN)
 Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(1.1609757237464713,NaN)
 Dual{ForwardDiff.Tag{Turing.Essential.var"#f#4"{DynamicPPL.TypedVarInfo{NamedTuple{(:popt,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}, Int64}, Vector{LogNormal{Float64}}, Vector{AbstractPPL.VarName{:popt, Setfield.IndexLens{Tuple{Int64}}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}, DynamicPPL.Model{typeof(sppfit), (:obs, Symbol("##arg#1367")), (Symbol("##arg#1367"),), (), Tuple{NamedTuple{(:cstocks, :cn, :cp), Tuple{Float64, Float64, Float64}}, DataType}, Tuple{DataType}, DynamicPPL.DefaultContext}, DynamicPPL.Sampler{NUTS{Turing.Essential.ForwardDiffAD{0}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext}, Float64}}(100000.0,0.0)

But when I try to reproduce the Dual with a simpler Dual type the solve raises an error:

probo3 = remake(probo2, p = ForwardDiff.Dual{:tag}.(probo2.p,0.0), u0 = ForwardDiff.Dual{:tag}.(probo2.u0), tspan=ForwardDiff.Dual{:tag}.(probo2.tspan))
sol3 = solve(probo3, Tsit5(), maxiters = 1e4);
ERROR: ArgumentError: reducing over an empty collection is not allowedStacktrace:  
[1] _empty_reduce_error()    @ Base ./reduce.jl:301  
...
 [13] sum
    @ ./reducedim.jl:890 [inlined]
 [14] sse(x::Dual{:tag, Float64, 0})
    @ DiffEqBase ~/scratch/twutz/julia_cluster_depots/packages/DiffEqBase/zgvoe/src/forwarddiff.jl:26
...
 [24] ODE_DEFAULT_NORM
    @ ~/scratch/twutz/julia_cluster_depots/packages/DiffEqBase/zgvoe/src/forwarddiff.jl:34 [inlined]
 [25] __init(prob::ODEProblem{Vector{Dual{:tag, Float64, 0}}, Tuple{Dual{:tag, Int64, 0}, Dual{:tag, Int64, 0}}, true, Vector{Dual{:tag, Float64, 1}}, ODEFunction{true, ModelingToolkit.var"#f#387"{RuntimeGeneratedFunctions.RuntimeGeneratedFunctio
n{(:ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0xf2220a6e, 0xb10be9ac, 0x40244113, 0x680b56b0, 0xb7df1a40)}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋out, :ˍ₋arg1, :ˍ₋arg2, :t), Mode
lingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x9a560f27, 0x329261b1, 0x518d4520, 0x67fd7dc0, 0xb526195f)}}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Vect
or{Symbol}, Symbol, ModelingToolkit.var"#377#generated_observed#394"{Bool, ODESystem, Dict{Any, Any}}, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5{typeof(OrdinaryDiffEq.trivia
l_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::Tuple{}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_
everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Dual{:tag, Int64, 0}, dtmin::Nothing, dtmax::Dual{:tag, Int64, 0}, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, ab
stol::Nothing, reltol::Nothing, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Float64,
 internalnorm::typeof(DiffEqBase.ODE_DEFAULT_NORM), internalopnorm::typeof(opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::typeof(DiffEqBase.ODE_DEFAULT_UNSTABLE_CHECK), verbose::Bool, timeseries_errors::Bool
, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_e
xtrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})

How can I inspect the MTK generated derivative function of the Problem and call it with Dual inputs?
How can I see which equations of the submodels may cause the NaNs in the partial?
Or am I on a wrong route and should inspect or test something else?

Why Tsit5? If you hit maxiters, that’s normally a sign of stiffness. See the PSA:

I used Tsit5() because it worked well in the Float64 case. I simulate a system close to steady state fixpoint. And because of a former issue with composite solutions default solver.

I also tried with different Algorithms, without change. Actually, I did not hit maxiters, but :DtLessThanMin. I used the maxiters argument above to spend not too much time with parameterizations that are cause problems. When I hit maxiter I return Inf cost or -Inf Log-Likelihood.

The derivative calculation could be less stable than the primal, in which case you’ll need to lower the tolerance and maybe change the solver to get an accurate derivative.

I probably got to the root of the problem or an important symptom. Equations of the following form caused NaNs in partial derivatives when the denominator became very small:

revenue = IfElse.ifelse(α > 0, return / α, 0.0),

Replacing the guard against division by zero to a more conservative number, prevented the problem and allowed the NUTS sampler to work properly.

revenue = IfElse.ifelse(α > 1e-16, return / α, 0.0),

I will describe strategies that helped me getting there in a separate comment.

In addition to advise given in the PSA the following strategies helped me debugging.

Inspecting how the derivate function is called by the solver by wrapping it.
Although I was not able to peek into the derivative function generated by MTK, I was able to replace it by a wrapper in ODEProblem.

df = probos.f
df2 = (du,u,p,t) -> begin
    df(du,u,p,t)
    @show t,u, du
end
probos_d = remake(probos, f = df2)
sol = solve(probos_d, solver);

This allowed me to get to the actual states that are tried by the solver - which I cannot see in the solution that ends earlier when problems occur.

Directly calling the derivative function with Dual numbers at specified state and parameters

dud = PreallocationTools.dualcache(probos.u0)
fo = (p) -> begin
    _du = PreallocationTools.get_tmp(dud, first(p))
    probos.f(_du,probos.u0,p,0)
    @show _du
    sum(abs2, _du)
end 
dfop = ForwardDiff.gradient(fo, probos.p);

This allowed me to explore the problem in a more agile manner.

Overriding equations of an assembled ODESystem
Rather than changing the source code of the different components and reloading (Revise) and reassembling the system, I was able modify the ODESystem and experiment by replacing specific equations.

function sesam_constI2(;name, s = sesam3(;name=:s))
    @variables t
    D = Differential(t)
    @unpack α_RP, return_RP, revenue_RP, syn_Enz_w = s
    eqs = [
        revenue_RP ~ IfElse.ifelse(α_RP > 1e-16, return_RP / (α_RP * syn_Enz_w), 0.0),
    ]
    basesys = s
    sys = ODESystem(eqs, t, [], []; name)
    # extend_overwrite so far only in extend_overwrite.jl in project dir
    sys_ext = extend_overide(sys, basesys) 
end
s2 = sesam3(name=:s) 
sci2 = sesam_constI2(;s=s2,name=:s); equations(sci2)
@named spci2 = plant_sesam_system(sci2,pl) # couples to drivers and calls simplify
probo2 = ODEProblem(spci2, u0site, tspan, psite)
probos = remake(probo2, u0 = ForwardDiff.value.(u)) # u from steps above