What does the "maxiters" solver option do?

donkeysaddle · April 11, 2020, 2:29am

New to Numerical Analysis, so please excuse the rookie question and forgive me if this has been discussed somewhere else, I couldn’t find it.

According to the documentation:

maxiters : Maximum number of iterations before stopping. Defaults to 1e5.

Is this the maximum number of steps the solver will take? One idea being that if the equation is stiff you’ll need to set it to be quite large?

I’ve gotten the associated error for parameter estimation using the Optim optimizer:

┌ Warning: Interrupted. Larger maxiters is needed.

And have dealt with it successfully by increasing the maxiters value, but I’d like to know what’s going on. Related, is there a way to see what parameter values the Optim optimizer has landed on that are causing it to throw this error?

Thanks,

DS

ChrisRackauckas · April 11, 2020, 2:58am

Yes

Not necessarily. If the problem is stiff and you’re using an explicit Runge-Kutta method, yes then it may need to be large. Usually if this is hit, it’s avoiding useless computations because you can have something like a problem where your dt → 1e-14 and then wants to churn away at the problem for centuries, so hitting this limit then tells you you should probably change something about the solver choice or maybe your model might not be stable (i.e. your ODEProblem might not be the one you thought you wrote down). But if you just have a long problem, like a long tspan or something, then yeah there’s nothing wrong with increasing the maxiters: that’s why the option is there.

It’s a warning and not an error because it’s common in parameter optimization that you can hit parameters which just don’t make sense for your model, in which case you can be producing some wild values, so the integrator is exiting. In those cases, you can just throw an infinite loss to the optimizer and it’ll know to avoid that parameter region.

show_trace=true or something like that in the optimizer keyword arguments.

donkeysaddle · April 11, 2020, 3:08am

This is awesome, thanks so much.

Sorry about the last question.

DS

Volker · December 14, 2020, 2:55pm

I have a question. How I can simulate/solve the ODE i.e. for quite long simulation duration? I.e. 2000s.

I ask, because I got this warning as well. Not during the optimization, but later when I just solved the ODE for a long tspan. I tried to increase the maxiter parameter, but then the solving didn´t finish (I let it run in the background for some hours). I could imagine, that simply stacking smaller solvings would work, but I think there is perhaps a more comfortable way, which I don´t know.

I got the warning in the context of the problem, which is described under the following link.

ChrisRackauckas · December 15, 2020, 10:34am

This has nothing to do with a long simulation duration and all to do with your ODE. Many ODE solve in 10 steps for 2000s, some take a few million. Does your ODE diverge? What does it look like? Are you using an explicit method and it becomes stiff after some time? Those are the questions to look at. But yes, if you increase maxiters then it might take a bit to solve because you’re telling it to do a ton of iterations, which usually means you have (1) need to try a different integration algorithm or (2) just have a hard problem.

With no code it’s impossible to distinguish between the two.

Volker · June 25, 2021, 6:07am

I changed the solver from a variable step ode solver Tsit5 to a fixed-step ode solver RK4 which has solved the problem.

Volker · June 25, 2021, 7:48am

I also get the warning

┌ Warning: Interrupted. Larger maxiters is needed.

when I use RK4 solver with a fixed step size of 0.01 s.

The code which I used is the nearly the same as in the post

This also the code example for exogenous inputs on the documenation page of DiffEqFlux.

The just changed the solver to RK4, a slightly different excitation and updated the code to GalacticOptim.

I’m a little bit confused, because I thought, that maxiters would define the maximum iterations of the solver. Therefore, I would expect, that I would need 100 iteration when I simulated from 0.0 s to 1.0 s. Or is it the overall maximum iteration? In other words, for all iterations of optimization.

ChrisRackauckas · June 25, 2021, 9:05am

maxiters of the solver is the maximum iterations it will allow in the simulation before exiting with a warning. You want that to be as high as you need for a given ODE. It’s there to catch computations that would run forever, i.e. non-stiff ODE solver on a stiff ODE. maxiters on the optimizer is for choosing the number of steps, which isn’t (always) automatically ending.

Volker · June 28, 2021, 8:40am

That’s what I thought at first too. But then this post confused me

I’m trying to implement an automation for the training. In example, by considering your tips of avoiding local minima.

One example for step-based excitation signals could be sequentially train from one constant phase to the next.

Another idea is:

Define initial subset of training data
Subset equal whole training set => Go to Step 8
Train with subset
Simulate the whole training data set
Find point where UODE diverges
Extend subset until point where UODE diverges
Go to Step 2
Termination

In every training loop where the warning

┌ Warning: Interrupted. Larger maxiters is needed.

pops up the training is very slow/ takes quite long. Is there a way to quit the optimization loop of GalacticOptim.solve / Optim.optimize at a fixed iteration?

Do you have tried automations like this? Or do you think it is purposeful?

ChrisRackauckas · June 28, 2021, 11:04am

I think it could work. For the maxiters, what’s causing it? Stability (did you try a method for stiff equations or a different adjoint)? Divergence? Etc.

Volker · June 28, 2021, 11:12am

I mean violating a user specific threshold which rates the deviation of the model to process. Not until now, because the process which I try to approximate is not stiff.

f(x) = (atan(8.0 * x - 4.0) + atan(4.0)) / (2.0 * atan(4.0))

function hammerstein_system(u)
    y= zeros(size(u))
    for k in 2:length(u)
        y[k] = 0.2 * f(u[k-1]) + 0.8 * y[k-1]
    end
    return y
end

Does the optimization often bring the parameters of the UODE to stiff or unstable models?

ChrisRackauckas · June 28, 2021, 11:18am

It definitely could, just like normal parameter optimization.

Volker · June 28, 2021, 11:36am

Ok, so far I have not often faced it. Until now I’ve trained LMNs, MLPs, simple RNNs and LSTMs. I will investigate if the model become stifff or unstable. I would do it via an analyzing of the eigenvalues of the linearized ode. Or would you do it differently?

ChrisRackauckas · June 28, 2021, 11:37am

Look at the diverged solutions first.

Volker · June 29, 2021, 2:17pm

Hey @ChrisRackauckas, I have solved the problems I had at 29th of June (I used the whole tspan = 0.1 s to 200 s and for the loss I just considered specific indices of it e.g. 0.1s to 1.6s etc.)

function loss(p)
    sol = predict_neuralode_rk4(p)
    N = sum(idx)
    return sum(abs2.(y[idx] .- sol'))/N
end

Now I change the tspan see code below. In this case stiffness or instability weren’t problem.

However, now I get an error message of the loss function

ERROR: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 122 and 102").

I already have proved the dimensions of the prediction and the data (both 122). I don’t know the reason for the error and why it just occurs at iteration 13.

using DifferentialEquations, Flux, Optim, DiffEqFlux, DiffEqSensitivity, Plots, GalacticOptim

plotly()

ex = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.24, 0.24, 0.24, 0.24, 0.24, 0.62, 0.62, 0.62, 0.62, 0.62, 0.62, 0.0, 0.0, 0.0, 0.0, 0.0, 0.38, 0.38, 0.38, 0.38, 0.38, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.44, 0.44, 0.44, 0.44, 0.44, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.56, 0.56, 0.56, 0.56, 0.56, 0.42, 0.42, 0.42, 0.42, 0.42, 0.3, 0.3, 0.3, 0.3, 0.3, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88]


idxs = [16, 21, 27, 32, 37, 52, 57, 72, 77, 82, 87, 102,122]

tspan = (0.1f0, Float32(length(ex)*0.1))

## process definition
f(x) = (atan(8.0 * x - 4.0) + atan(4.0)) / (2.0 * atan(4.0))

function hammerstein_system(u)
    y= zeros(size(u))
    for k in 2:length(u)
        y[k] = 0.2 * f(u[k-1]) + 0.8 * y[k-1]
    end
    return y
end

## simulation
y = Float32.(hammerstein_system(ex))

## model design
nn_model = FastChain(FastDense(2,8, tanh), FastDense(8, 1))
p_model = initial_params(nn_model)

function dudt(u, p, t)
    nn_model(vcat(u[1], ex[Int(round(10.0*t))]), p)
end

function predict_neuralode_rk4(p)
    _prob = remake(prob,p=p)
    Array(solve(_prob, RK4(), dt = 0.01f0, saveat=[0.1f0:0.1f0:Float32(tspan[2]);]))
end

## loss function definition
loss(p) = sum(abs2.(y_sub .- vec(predict_neuralode_rk4(p))))

function train(t0, t1, dudt, p, loss)
	tspan = (t0, t1)
	u0 = [Float32.(y[1])]

    global y_sub = y[1:Int(round(t1*10.0f0))]	
    global prob = ODEProblem(dudt,u0,tspan,nothing)

    adtype = GalacticOptim.AutoZygote()
    optf = GalacticOptim.OptimizationFunction((x, p) -> loss(x), adtype)
    optfunc = GalacticOptim.instantiate_function(optf, p, adtype, nothing)
    optprob = GalacticOptim.OptimizationProblem(optfunc, p)

    res_tmp =  GalacticOptim.solve(optprob, ADAM(0.05), maxiters=50)

    optprob = remake(optprob,u0 = res_tmp.u)

    return GalacticOptim.solve(optprob, LBFGS(), allow_f_increases = false)
end

p_tmp = deepcopy(p_model)
sim_idx = 13
for i in 1:sim_idx
    println("Iteration $i - 0.1s - $(idxs[i]*0.1) s")
    res_tmp = train(0.1f0, Float32(idxs[i]*0.1), dudt, p_tmp, loss)
    p_tmp = deepcopy(res_tmp.u)
end

Furthermore, I tried to train with multiple shooting, but for this I’m getting a domain error although my specified group size is in the limits.

group_size =12
datasize = 102
continuity_term = 200
tspan = (0.1f0, Float32(102*0.1))
tsteps = range(tspan[1], tspan[2], length = idxs[12])
function loss_function(data, pred)
	return sum(abs2, data - pred)
end

function loss_multiple_shooting(p)
    return multiple_shoot(p_model, y[1:idxs[12]], tsteps, prob, loss_function, RK4(),
                          group_size; continuity_term)
end

res_ms = DiffEqFlux.sciml_train(loss_multiple_shooting, p_model)

Note that the code of multiple shooting depends on the above code.

ChrisRackauckas · July 2, 2021, 10:42pm

Can you open an issue on this one?

Volker · July 5, 2021, 9:31am

I mentioned both errors on

github.com/SciML/DiffEqFlux.jl

Training loop => ERROR: DimensionMismatch && Multiple Shooting => Domain Error

opened 09:30AM - 05 Jul 21 UTC

VoSiLk

https://discourse.julialang.org/t/what-does-the-maxiters-solver-option-do/37376/…13 Environment details: OS: Windows 10 x64 Julia Version : 1.6.0 Packages: - DiffEqFlux v1.39.0 - DiffEqSensitivity v6.52.0 - DifferentialEquations v6.17.1 - Flux v0.12.4 - GalacticOptim v1.3.0 - Optim v1.3.0 - Zygote v0.6.14 Hey, I have solved the problems I had at 29th of June (I used the whole tspan = 0.1 s to 200 s and for the loss I just considered specific indices of it e.g. 0.1s to 1.6s etc.) function loss(p) sol = predict_neuralode_rk4(p) N = sum(idx) return sum(abs2.(y[idx] .- sol'))/N end Now I change the tspan see code below. In this case stiffness or instability weren’t problem. However, now I get an error message of the loss function ERROR: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 122 and 102"). I already have proved the dimensions of the prediction and the data (both 122). I don’t know the reason for the error and why it just occurs at iteration 13. ```julia using DifferentialEquations, Flux, Optim, DiffEqFlux, DiffEqSensitivity, Plots, GalacticOptim plotly() ex = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.24, 0.24, 0.24, 0.24, 0.24, 0.62, 0.62, 0.62, 0.62, 0.62, 0.62, 0.0, 0.0, 0.0, 0.0, 0.0, 0.38, 0.38, 0.38, 0.38, 0.38, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.44, 0.44, 0.44, 0.44, 0.44, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.56, 0.56, 0.56, 0.56, 0.56, 0.42, 0.42, 0.42, 0.42, 0.42, 0.3, 0.3, 0.3, 0.3, 0.3, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88] idxs = [16, 21, 27, 32, 37, 52, 57, 72, 77, 82, 87, 102,122] tspan = (0.1f0, Float32(length(ex)*0.1)) ## process definition f(x) = (atan(8.0 * x - 4.0) + atan(4.0)) / (2.0 * atan(4.0)) function hammerstein_system(u) y= zeros(size(u)) for k in 2:length(u) y[k] = 0.2 * f(u[k-1]) + 0.8 * y[k-1] end return y end ## simulation y = Float32.(hammerstein_system(ex)) ## model design nn_model = FastChain(FastDense(2,8, tanh), FastDense(8, 1)) p_model = initial_params(nn_model) function dudt(u, p, t) nn_model(vcat(u[1], ex[Int(round(10.0*t))]), p) end function predict_neuralode_rk4(p) _prob = remake(prob,p=p) Array(solve(_prob, RK4(), dt = 0.01f0, saveat=[0.1f0:0.1f0:Float32(tspan[2]);])) end ## loss function definition loss(p) = sum(abs2.(y_sub .- vec(predict_neuralode_rk4(p)))) function train(t0, t1, dudt, p, loss) global tspan = (t0, t1) u0 = [Float32.(y[1])] global y_sub = y[1:Int(round(t1*10.0f0))] global prob = ODEProblem(dudt,u0,tspan,nothing) adtype = GalacticOptim.AutoZygote() optf = GalacticOptim.OptimizationFunction((x, p) -> loss(x), adtype) optfunc = GalacticOptim.instantiate_function(optf, p, adtype, nothing) optprob = GalacticOptim.OptimizationProblem(optfunc, p) res_tmp = GalacticOptim.solve(optprob, ADAM(0.05), maxiters=50) optprob = remake(optprob,u0 = res_tmp.u) return GalacticOptim.solve(optprob, LBFGS(), allow_f_increases = false) end p_tmp = deepcopy(p_model) sim_idx = 13 for i in 1:sim_idx println("Iteration $i - 0.1s - $(idxs[i]*0.1) s") res_tmp = train(0.1f0, Float32(idxs[i]*0.1), dudt, p_tmp, loss) p_tmp = deepcopy(res_tmp.u) end ``` Furthermore, I tried to train with multiple shooting, but for this I’m getting a domain error although my specified group size is in the limits. ```julia group_size =12 datasize = 102 continuity_term = 200 tspan = (0.1f0, Float32(102*0.1)) tsteps = range(tspan[1], tspan[2], length = idxs[12]) function loss_function(data, pred) return sum(abs2, data - pred) end function loss_multiple_shooting(p) return multiple_shoot(p_model, y[1:idxs[12]], tsteps, prob, loss_function, RK4(), group_size; continuity_term) end res_ms = DiffEqFlux.sciml_train(loss_multiple_shooting, p_model) ``` ERROR: DomainError with 12: group_size can't be < 2 or > number of data points Note that the code of multiple shooting depends on the above code.

gmueller · May 9, 2023, 5:41am

Hi @ChrisRackauckas,

What if I would like to run the integrator interface indefinitely?
Did I understand correctly that maxiters, no matter how large will be overshot at some point when running for a large enough time span?
How would one circumvent that?

Best, thx

ChrisRackauckas · May 9, 2023, 8:15am

Sert maxiters = Inf

gmueller · May 9, 2023, 8:26am

Simple and effective, thx!

Topic		Replies	Views
Julia DifferentialEquations: increase maxiter Numerics question , package , diffeq	6	2470	November 18, 2020
Set maxiters on OrdinaryDiffEq New to Julia question	5	1014	June 7, 2020
Strange MaxIters (numeric instability?) occured when solving certain SDE Modelling & Simulations diffeq , sde	10	815	November 14, 2020
ODE Sundials not recognizing `maxiters` argument Modelling & Simulations question , package , diffeq	2	463	July 2, 2020
DiffEq: Flux.train crashes mid-training and requests larger maxiter General Usage diffeq , flux	3	482	November 14, 2020

What does the "maxiters" solver option do?

Related topics