What does the "maxiters" solver option do?

New to Numerical Analysis, so please excuse the rookie question and forgive me if this has been discussed somewhere else, I couldn’t find it.

According to the documentation:

  • maxiters : Maximum number of iterations before stopping. Defaults to 1e5.

Is this the maximum number of steps the solver will take? One idea being that if the equation is stiff you’ll need to set it to be quite large?

I’ve gotten the associated error for parameter estimation using the Optim optimizer:

┌ Warning: Interrupted. Larger maxiters is needed.

And have dealt with it successfully by increasing the maxiters value, but I’d like to know what’s going on. Related, is there a way to see what parameter values the Optim optimizer has landed on that are causing it to throw this error?

Thanks,

DS

Yes

Not necessarily. If the problem is stiff and you’re using an explicit Runge-Kutta method, yes then it may need to be large. Usually if this is hit, it’s avoiding useless computations because you can have something like a problem where your dt → 1e-14 and then wants to churn away at the problem for centuries, so hitting this limit then tells you you should probably change something about the solver choice or maybe your model might not be stable (i.e. your ODEProblem might not be the one you thought you wrote down). But if you just have a long problem, like a long tspan or something, then yeah there’s nothing wrong with increasing the maxiters: that’s why the option is there.

It’s a warning and not an error because it’s common in parameter optimization that you can hit parameters which just don’t make sense for your model, in which case you can be producing some wild values, so the integrator is exiting. In those cases, you can just throw an infinite loss to the optimizer and it’ll know to avoid that parameter region.

show_trace=true or something like that in the optimizer keyword arguments.

8 Likes

This is awesome, thanks so much.

Sorry about the last question.

DS

I have a question. How I can simulate/solve the ODE i.e. for quite long simulation duration? I.e. 2000s.

I ask, because I got this warning as well. Not during the optimization, but later when I just solved the ODE for a long tspan. I tried to increase the maxiter parameter, but then the solving didn´t finish (I let it run in the background for some hours). I could imagine, that simply stacking smaller solvings would work, but I think there is perhaps a more comfortable way, which I don´t know.

I got the warning in the context of the problem, which is described under the following link.

This has nothing to do with a long simulation duration and all to do with your ODE. Many ODE solve in 10 steps for 2000s, some take a few million. Does your ODE diverge? What does it look like? Are you using an explicit method and it becomes stiff after some time? Those are the questions to look at. But yes, if you increase maxiters then it might take a bit to solve because you’re telling it to do a ton of iterations, which usually means you have (1) need to try a different integration algorithm or (2) just have a hard problem.

With no code it’s impossible to distinguish between the two.

I changed the solver from a variable step ode solver Tsit5 to a fixed-step ode solver RK4 which has solved the problem.

I also get the warning

┌ Warning: Interrupted. Larger maxiters is needed.

when I use RK4 solver with a fixed step size of 0.01 s.

The code which I used is the nearly the same as in the post

This also the code example for exogenous inputs on the documenation page of DiffEqFlux.

The just changed the solver to RK4, a slightly different excitation and updated the code to GalacticOptim.

I’m a little bit confused, because I thought, that maxiters would define the maximum iterations of the solver. Therefore, I would expect, that I would need 100 iteration when I simulated from 0.0 s to 1.0 s. Or is it the overall maximum iteration? In other words, for all iterations of optimization.

maxiters of the solver is the maximum iterations it will allow in the simulation before exiting with a warning. You want that to be as high as you need for a given ODE. It’s there to catch computations that would run forever, i.e. non-stiff ODE solver on a stiff ODE. maxiters on the optimizer is for choosing the number of steps, which isn’t (always) automatically ending.

That’s what I thought at first too. But then this post confused me

I’m trying to implement an automation for the training. In example, by considering your tips of avoiding local minima.

One example for step-based excitation signals could be sequentially train from one constant phase to the next.

Another idea is:

  1. Define initial subset of training data
  2. Subset equal whole training set => Go to Step 8
  3. Train with subset
  4. Simulate the whole training data set
  5. Find point where UODE diverges
  6. Extend subset until point where UODE diverges
  7. Go to Step 2
  8. Termination

In every training loop where the warning

┌ Warning: Interrupted. Larger maxiters is needed.

pops up the training is very slow/ takes quite long. Is there a way to quit the optimization loop of GalacticOptim.solve / Optim.optimize at a fixed iteration?

Do you have tried automations like this? Or do you think it is purposeful?

I think it could work. For the maxiters, what’s causing it? Stability (did you try a method for stiff equations or a different adjoint)? Divergence? Etc.

1 Like

I mean violating a user specific threshold which rates the deviation of the model to process. Not until now, because the process which I try to approximate is not stiff.

f(x) = (atan(8.0 * x - 4.0) + atan(4.0)) / (2.0 * atan(4.0))

function hammerstein_system(u)
    y= zeros(size(u))
    for k in 2:length(u)
        y[k] = 0.2 * f(u[k-1]) + 0.8 * y[k-1]
    end
    return y
end

Does the optimization often bring the parameters of the UODE to stiff or unstable models?

It definitely could, just like normal parameter optimization.

1 Like

Ok, so far I have not often faced it. Until now I’ve trained LMNs, MLPs, simple RNNs and LSTMs. I will investigate if the model become stifff or unstable. I would do it via an analyzing of the eigenvalues of the linearized ode. Or would you do it differently?

Look at the diverged solutions first.

1 Like

Hey @ChrisRackauckas, I have solved the problems I had at 29th of June (I used the whole tspan = 0.1 s to 200 s and for the loss I just considered specific indices of it e.g. 0.1s to 1.6s etc.)

function loss(p)
    sol = predict_neuralode_rk4(p)
    N = sum(idx)
    return sum(abs2.(y[idx] .- sol'))/N
end

Now I change the tspan see code below. In this case stiffness or instability weren’t problem.

However, now I get an error message of the loss function

ERROR: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 122 and 102").

I already have proved the dimensions of the prediction and the data (both 122). I don’t know the reason for the error and why it just occurs at iteration 13.

using DifferentialEquations, Flux, Optim, DiffEqFlux, DiffEqSensitivity, Plots, GalacticOptim

plotly()

ex = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.24, 0.24, 0.24, 0.24, 0.24, 0.62, 0.62, 0.62, 0.62, 0.62, 0.62, 0.0, 0.0, 0.0, 0.0, 0.0, 0.38, 0.38, 0.38, 0.38, 0.38, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.78, 0.44, 0.44, 0.44, 0.44, 0.44, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.56, 0.56, 0.56, 0.56, 0.56, 0.42, 0.42, 0.42, 0.42, 0.42, 0.3, 0.3, 0.3, 0.3, 0.3, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88, 0.88]


idxs = [16, 21, 27, 32, 37, 52, 57, 72, 77, 82, 87, 102,122]

tspan = (0.1f0, Float32(length(ex)*0.1))

## process definition
f(x) = (atan(8.0 * x - 4.0) + atan(4.0)) / (2.0 * atan(4.0))

function hammerstein_system(u)
    y= zeros(size(u))
    for k in 2:length(u)
        y[k] = 0.2 * f(u[k-1]) + 0.8 * y[k-1]
    end
    return y
end

## simulation
y = Float32.(hammerstein_system(ex))

## model design
nn_model = FastChain(FastDense(2,8, tanh), FastDense(8, 1))
p_model = initial_params(nn_model)

function dudt(u, p, t)
    nn_model(vcat(u[1], ex[Int(round(10.0*t))]), p)
end

function predict_neuralode_rk4(p)
    _prob = remake(prob,p=p)
    Array(solve(_prob, RK4(), dt = 0.01f0, saveat=[0.1f0:0.1f0:Float32(tspan[2]);]))
end

## loss function definition
loss(p) = sum(abs2.(y_sub .- vec(predict_neuralode_rk4(p))))

function train(t0, t1, dudt, p, loss)
	tspan = (t0, t1)
	u0 = [Float32.(y[1])]

    global y_sub = y[1:Int(round(t1*10.0f0))]	
    global prob = ODEProblem(dudt,u0,tspan,nothing)

    adtype = GalacticOptim.AutoZygote()
    optf = GalacticOptim.OptimizationFunction((x, p) -> loss(x), adtype)
    optfunc = GalacticOptim.instantiate_function(optf, p, adtype, nothing)
    optprob = GalacticOptim.OptimizationProblem(optfunc, p)

    res_tmp =  GalacticOptim.solve(optprob, ADAM(0.05), maxiters=50)

    optprob = remake(optprob,u0 = res_tmp.u)

    return GalacticOptim.solve(optprob, LBFGS(), allow_f_increases = false)
end

p_tmp = deepcopy(p_model)
sim_idx = 13
for i in 1:sim_idx
    println("Iteration $i - 0.1s - $(idxs[i]*0.1) s")
    res_tmp = train(0.1f0, Float32(idxs[i]*0.1), dudt, p_tmp, loss)
    p_tmp = deepcopy(res_tmp.u)
end

Furthermore, I tried to train with multiple shooting, but for this I’m getting a domain error although my specified group size is in the limits.

group_size =12
datasize = 102
continuity_term = 200
tspan = (0.1f0, Float32(102*0.1))
tsteps = range(tspan[1], tspan[2], length = idxs[12])
function loss_function(data, pred)
	return sum(abs2, data - pred)
end

function loss_multiple_shooting(p)
    return multiple_shoot(p_model, y[1:idxs[12]], tsteps, prob, loss_function, RK4(),
                          group_size; continuity_term)
end

res_ms = DiffEqFlux.sciml_train(loss_multiple_shooting, p_model)

Note that the code of multiple shooting depends on the above code.

Can you open an issue on this one?

I mentioned both errors on

Hi @ChrisRackauckas,

What if I would like to run the integrator interface indefinitely?
Did I understand correctly that maxiters, no matter how large will be overshot at some point when running for a large enough time span?
How would one circumvent that?

Best, thx

Sert maxiters = Inf

1 Like

Simple and effective, thx!