Integration fails: “Warning: Automatic dt set the starting dt as NaN, causing instability”

I’ve written some code to perform parameter estimation on a system of ODEs, and I keep getting the following error:

┌ Warning: First function call produced NaNs. Exiting.
└ @ OrdinaryDiffEq C:\Users\Michael\.julia\packages\OrdinaryDiffEq\5GBYL\src\initdt.jl:76
┌ Warning: Automatic dt set the starting dt as NaN, causing instability.
└ @ OrdinaryDiffEq 

(I actually got several hundred of these warnings at once.)

Here are some relevant details about the problem:

  1. This is the ODE system whose parameters I’m trying to estimate:
function One_Age_Model(du,u,pars,t)
    S, E, P, A, AR, I, H, R, D, Hc = u
    βSA, βSI, μ, σ₁, σ₂, ϕ, M, M_AR, γ, ω, χ, ψ, ηSA, ηSI, tQ = pars
    
    βSA_t = βSA*(ηSA + (1 - ηSA)*0.5*(1 - tanh(2*(t-tQ))))
    βSI_t = βSI*(ηSI + (1 - ηSI)*0.5*(1 - tanh(2*(t-tQ))))
   
    dS = -βSA_t*S*(A+P) - βSI_t*S*I - μ*S
    dE = βSA_t*S*(A+P) + βSI_t*S*I - σ₁*E
    dP = (1-ϕ)*σ₁*E - σ₂*P
    dA = ϕ*σ₁*E - M_AR*A
    dAR = M_AR*A
    dI = σ₂*P - M*I
    dH = γ*M*I - (1-ω)*χ*H - ω*ψ*H
    dR = (1-γ)*M*I + (1-ω)*χ*H
    dD = ω*ψ*H
    dHc = γ*M*I
    du .= [dS, dE, dP, dA, dAR, dI, dH, dR, dD, dHc]
    return nothing
end
  1. Here’s my call to the integrator:
  ODE_prob = ODEProblem(One_Age_Model, ODE_ICs_scaled, t_span, ODE_params)
  ODE_sol = solve(ODE_prob, integrator, reltol = 1e-8, abstol = 1e-8 saveat = t_obs)

Here, t_span = (0.0,100.0) and t_obs = collect(0.0:1.0:100.0).

  1. I’m using ForwardDiff for the optimization, and I recently learned that the ODE solvers apparently also use ForwardDiff (I did not know this before…)

Here’s a quick summary of my code: I’m using the so-called “boostrapping” approach to parameter estimation, in which I generated 200 sets of “simulated” data from a set of observed data by adding random noise to the observed data. For each set of simulated data, I’m running 5 optimizations using IPNewton from Optim (so 5 x 200 = 1000 total optimizations). For each optimization I give the optimizer a random initial guesses, and I record the best (out of 5) minimizers for each data set.

I got the above error message when the program was 97% of the way through the 1000 optimizations (after about 10 hrs!), and when it was finished, it returned a list of just 2 minimizers (arg!! Why?!). So it seems as if the integration was failing throughout the entire program pretty much…I’m very confused because I’ve successfully run my code on small batches (~50 total optimizations). The occurrence of the error seems somewhat random.

So my questions are: How did the initial dt even get set to NaN? Is there a way to set the initial dt manually to prevent this from happening? Is the issue with ForwardDiff somehow, or is it possibly the ODE model itself? Any other things I should try?

I’m less inclined to believe the ODE model is the problem, as I’ve solved it thousands of times before in previous parameter estimation problems. Most of the time it integrates without a hitch, but other times this NaN dt error happens…

Any insights or troubleshooting suggestions would be hugely appreciated!

That means you probably found some bad parameters. Think about something like the SIR model, if the recovery rate goes negative, then the solution quickly diverges to infinity! This can happen during optimization of parameters if you aren’t careful. You want to add something like, if diverge then loss is Inf. Divergence can be checked via sol.retcode == :Success. The Inf in something like Optim.jl will make it reject the step in parameter space and try again.

1 Like

Thanks for your reply! This is very helpful.

Shortly before you posted I managed to get it working by simply putting dt = 1e-2 in the arguments list of solve(). I ran 1000 total optimizations and didn’t get any errors :slight_smile: I’ll still implement your suggestion, just in case. (The If/Else block around each call to the solve() shouldn’t hurt performance very much, right?)

I still don’t understand how my original error occurred. I attempted to reproduce the error with a divergent solution as follows:

function f!(du,u,p,t)
    du[1] = exp(u[1])
    du[2] = u[2]
end

prob = ODEProblem(f!, [1.0, 1.0], (0.0,10.0))
sol = solve(prob, Tsit5(), saveat = 0.0:1.0:10.0)

But this gave a different (though more helpful) error message:

┌ Warning: dt <= dtmin. Aborting. There is either an error in your model specification or the true solution is unstable.
└ @ DiffEqBase C:\Users\Michael\.julia\packages\DiffEqBase\ORQhu\src\integrator_interface.jl:342

So it seems like divergence isn’t the cause of the original error. Any other ideas for what might’ve caused it?

Thanks again!

That’s really just stating the same thing. In this case, now there’s a divergence causing the time steps to shorten, and it halts the integration because of hitting minimum dt before getting to infinity. That means you’re near a bad parameter, but not exactly on it to cause a NaN in the first call to f.

Thanks for your reply. I implemented your suggestion (if sol.retcode == :Success, calculate the loss, else return Inf), re-ran a small batch of optimizations, and unfortunately after a few minutes I got about a zillion of the following error messages:

 Warning: Instability detected. Aborting
└ @ DiffEqBase C:\Users\Michael\.julia\packages\DiffEqBase\ORQhu\src\integrator_interface.jl:348

So slightly different than before…are we sure that this is bad parameters and not a bug in the solver? (I do have each parameter constrained to an interval that should be fairly reasonable, so divergence feels rather unlikely…

Any suggestions for handling this error with if/else’s or try/catch’s, or other approach?

Pull out the parameter and check.

Good idea, trying that now!

So I set up a test in which I loop through 100,000 different sets of parameters/initial conditions (sampled uniformly from the box constraints that I’m using in the optimization) and integrate the system. After the call to the integrator inside the loop, I put

if ODE_sol.retcode != :Success
        println(ODE_sol.retcode)
end

But not a single warning or error was produced from the 100,000 integrations.

Any other suggestions for me to try? Also, what I’d really like is to have the program gracefully handle any blow-ups or other issues during integration, if/when they arise. This would be much easier than trying to identify all sets of “bad” parameters beforehand, and then trying to avoid picking them.

I can send you some more code to reproduce the integration if it’ll help.

Thanks again.

Did you do the exact ones that threw the warning during the optimization?

Hi Chris (apologies for the very delayed reply! I had to step away from Julia for a bit to tend to some other work…):

Yes, I used the same exact initial guesses for the optimizer. I think the best thing for me to do is to create a make a minimal(ish) reproducible example, so that’s what I’m working on now.

Would it be ok for us to continue this discussion on Slack?

Thanks for your help!

So I did another test where I tried to print out any parameter values where the ODE solver failed during the course of the optimization. Well, it did indeed fail after a couple runs and here’s what it printed out (just returning the current value of the objective function’s argument):

The x values that were passed to objective(): 
ForwardDiff.Dual{ForwardDiff.Tag{var"#objective#59"{typeof(OneAgeModel!), 
Vector{Symbol}, Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, 
Vector{Int64}, Vector{Real}, Vector{Real}, typeof(f_ICs), Float64, 
typeof(mean_square_rel_error), Vector{Real}, Vector{Union{Missing, Real}}, 
Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, Int64, Int64, Int64, Vector{String}, 
DataFrame}, Float64}, Float64, 10}[Dual{ForwardDiff.Tag{var"#objective#59"
{typeof(OneAgeModel!), Vector{Symbol}, Vector{Float64}, Vector{Int64}, Vector{Int64}, 
Vector{Int64}, Vector{Int64}, Vector{Real}, Vector{Real}, typeof(f_ICs), Float64, typeof(mean_square_rel_error), Vector{Real}, Vector{Union{Missing, Real}}, 
Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, Int64, Int64, Int64, Vector{String},
 DataFrame}, Float64}}(NaN,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), 
Dual{ForwardDiff.Tag{var"#objective#59"{typeof(OneAgeModel!), Vector{Symbol}, 
Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Real}, 
Vector{Real}, typeof(f_ICs), Float64, typeof(mean_square_rel_error), Vector{Real}, 
Vector{Union{Missing, Real}}, Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, 
Int64, Int64, Int64, Vector{String}, DataFrame}, Float64}}
(NaN,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#objective#59"
{typeof(OneAgeModel!), Vector{Symbol}, Vector{Float64}, Vector{Int64}, Vector{Int64}, 
Vector{Int64}, Vector{Int64}, Vector{Real}, Vector{Real}, typeof(f_ICs), Float64,
 typeof(mean_square_rel_error), Vector{Real}, Vector{Union{Missing, Real}}, 
Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, Int64, Int64, Int64, Vector{String}, 
DataFrame}, Float64}}(NaN,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), 
Dual{ForwardDiff.Tag{var"#objective#59"{typeof(OneAgeModel!), Vector{Symbol}, 
Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Real}, 
Vector{Real}, typeof(f_ICs), Float64, typeof(mean_square_rel_error), Vector{Real}, 
Vector{Union{Missing, Real}}, Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, 
Int64, Int64, Int64, Vector{String}, DataFrame}, Float64}}
(NaN,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#objective#59"
{typeof(OneAgeModel!), Vector{Symbol}, Vector{Float64}, Vector{Int64}, Vector{Int64}, 
Vector{Int64}, Vector{Int64}, Vector{Real}, Vector{Real}, typeof(f_ICs), Float64,
 typeof(mean_square_rel_error), Vector{Real}, Vector{Union{Missing, Real}}, 
Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, Int64, Int64, Int64, Vector{String}, 
DataFrame}, Float64}}(NaN,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0), 
Dual{ForwardDiff.Tag{var"#objective#59"{typeof(OneAgeModel!), Vector{Symbol}, 
Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Real}, 
Vector{Real}, typeof(f_ICs), Float64, typeof(mean_square_rel_error), Vector{Real},
 Vector{Union{Missing, Real}}, Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, 
Int64, Int64, Int64, Vector{String}, DataFrame}, Float64}}
(NaN,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#objective#59"
{typeof(OneAgeModel!), Vector{Symbol}, Vector{Float64}, Vector{Int64}, Vector{Int64}, 
Vector{Int64}, Vector{Int64}, Vector{Real}, Vector{Real}, typeof(f_ICs), Float64, 
typeof(mean_square_rel_error), Vector{Real}, Vector{Union{Missing, Real}}, 
Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, Int64, Int64, Int64, Vector{String},
 DataFrame}, Float64}}(NaN,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0), 
Dual{ForwardDiff.Tag{var"#objective#59"{typeof(OneAgeModel!), Vector{Symbol}, 
Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Real}, 
Vector{Real}, typeof(f_ICs), Float64, typeof(mean_square_rel_error), Vector{Real}, 
Vector{Union{Missing, Real}}, Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, 
Int64, Int64, Int64, Vector{String}, DataFrame}, Float64}}
(NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#objective#59"
{typeof(OneAgeModel!), Vector{Symbol}, Vector{Float64}, Vector{Int64}, Vector{Int64},
 Vector{Int64}, Vector{Int64}, Vector{Real}, Vector{Real}, typeof(f_ICs), Float64, 
typeof(mean_square_rel_error), Vector{Real}, Vector{Union{Missing, Real}}, 
Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, Int64, Int64, Int64, Vector{String}, 
DataFrame}, Float64}}(NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0), 
Dual{ForwardDiff.Tag{var"#objective#59"{typeof(OneAgeModel!), Vector{Symbol}, 
Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Real},
 Vector{Real}, typeof(f_ICs), Float64, typeof(mean_square_rel_error), Vector{Real}, 
Vector{Union{Missing, Real}}, Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, 
Int64, Int64, Int64, Vector{String}, DataFrame}, Float64}}
(NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0), Dual{ForwardDiff.Tag{var"#objective#59"
{typeof(OneAgeModel!), Vector{Symbol}, Vector{Float64}, Vector{Int64}, Vector{Int64}, 
Vector{Int64}, Vector{Int64}, Vector{Real}, Vector{Real}, typeof(f_ICs), Float64, 
typeof(mean_square_rel_error), Vector{Real}, Vector{Union{Missing, Real}}, 
Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, Int64, Int64, Int64, Vector{String}, 
DataFrame}, Float64}}(NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), 
Dual{ForwardDiff.Tag{var"#objective#59"{typeof(OneAgeModel!), Vector{Symbol}, 
Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Real}, 
Vector{Real}, typeof(f_ICs), Float64, typeof(mean_square_rel_error), Vector{Real}, 
Vector{Union{Missing, Real}}, Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, 
Int64, Int64, Int64, Vector{String}, DataFrame}, Float64}}
(NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#objective#59"
{typeof(OneAgeModel!), Vector{Symbol}, Vector{Float64}, Vector{Int64}, Vector{Int64}, 
Vector{Int64}, Vector{Int64}, Vector{Real}, Vector{Real}, typeof(f_ICs), Float64, 
typeof(mean_square_rel_error), Vector{Real}, Vector{Union{Missing, Real}}, 
Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, Int64, Int64, Int64, Vector{String}, 
DataFrame}, Float64}}(NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), 
Dual{ForwardDiff.Tag{var"#objective#59"{typeof(OneAgeModel!), Vector{Symbol}, 
Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Real},
 Vector{Real}, typeof(f_ICs), Float64, typeof(mean_square_rel_error), Vector{Real}, 
Vector{Union{Missing, Real}}, Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, 
Int64, Int64, Int64, Vector{String}, DataFrame}, Float64}}
(NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#objective#59"
{typeof(OneAgeModel!), Vector{Symbol}, Vector{Float64}, Vector{Int64}, Vector{Int64}, 
Vector{Int64}, Vector{Int64}, Vector{Real}, Vector{Real}, typeof(f_ICs), Float64, 
typeof(mean_square_rel_error), Vector{Real}, Vector{Union{Missing, Real}}, 
Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, Int64, Int64, Int64, Vector{String},
 DataFrame}, Float64}}(NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), 
Dual{ForwardDiff.Tag{var"#objective#59"{typeof(OneAgeModel!), Vector{Symbol}, 
Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Real},
 Vector{Real}, typeof(f_ICs), Float64, typeof(mean_square_rel_error), Vector{Real}, 
Vector{Union{Missing, Real}}, Vector{Union{Missing, Real}}, Tsit5, Float64, Float64,
 Int64, Int64, Int64, Vector{String}, DataFrame}, Float64}}
(NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#objective#59"
{typeof(OneAgeModel!), Vector{Symbol}, Vector{Float64}, Vector{Int64}, Vector{Int64},
 Vector{Int64}, Vector{Int64}, Vector{Real}, Vector{Real}, typeof(f_ICs), Float64, 
typeof(mean_square_rel_error), Vector{Real}, Vector{Union{Missing, Real}}, 
Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, Int64, Int64, Int64, Vector{String},
 DataFrame}, Float64}}(NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), 
Dual{ForwardDiff.Tag{var"#objective#59"{typeof(OneAgeModel!), Vector{Symbol}, 
Vector{Float64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Int64}, Vector{Real}, Vector{Real}, typeof(f_ICs), Float64, typeof(mean_square_rel_error), Vector{Real}, 
Vector{Union{Missing, Real}}, Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, 
Int64, Int64, Int64, Vector{String}, DataFrame}, Float64}}
(NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#objective#59"
{typeof(OneAgeModel!), Vector{Symbol}, Vector{Float64}, Vector{Int64}, Vector{Int64},
 Vector{Int64}, Vector{Int64}, Vector{Real}, Vector{Real}, typeof(f_ICs), Float64, 
typeof(mean_square_rel_error), Vector{Real}, Vector{Union{Missing, Real}}, 
Vector{Union{Missing, Real}}, Tsit5, Float64, Float64, Int64, Int64, Int64, Vector{String}, 
DataFrame}, Float64}}(NaN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0)]

One observation: There NaN’s in the vectors that look like (NaN,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0) look problematic, though it’s not clear to me what these are: they have length 11, yet the objective function’s argument should have a length of 19 (since that’s the number of variables being minimized)…(And the ODE model has 10 variables).

Also, I’m not sure why it printed this huge long string of symbols rather than just the single tuple of values…

A MRE is in the works!