Differential equation becomes unstable when training mixed neural ODE

I’m trying to use a neural network to simulate the equation for the predator population in the Lotka-Volterra model:

function lotka_volterra(du,u,p,t)
  x, y = u
  α, β = (1.5,1.0)
  du[1] = dx = α*x - β*x*y
  du[2] = NN(u, p)
end 

The neural network is NN = FastChain(FastDense(2, 50, tanh), FastDense(50, 1)). I’m training it against a time series using the real Lotka-Volterra equations using a squared-difference loss function with sciml_train and the BFGS optimizer with initial_stepnorm=0.01f0.

It runs fine for about 120 iterations, reaching a loss of about 0.03. Then I get the message “Warning: Instability detected. Aborting”. Any idea why this is happening?

2 Likes

For an ODE, there are always some parameters that give a divergent ODE. For example, u' = a * u, if a is positive and goes large, you can bet the answer will go to infinity very fast and you’ll get “Warning: Instability detected. Aborting”. So it’s a fact of life with these kinds of nonlinear models. Luckily there’s an FAQ page on good ways to handle this topic, such as setting divergent trajectories to have infinite loss. See:

https://diffeqflux.sciml.ai/dev/examples/divergence/

2 Likes

Thanks for the clarification! I tried the example on the FAQ page and it worked for me, but when I implement the same infinite cost system in my own cost function, I get

MethodError: Cannot `convert` an object of type Nothing to an object of type Float32

and the code stops running rather than continuing to train. Any idea why this might be happening?

Also, I have a callback function plotting the solution at each iteration and it doesn’t look like it’s becoming unstable. Is it normal to have instability arise so suddenly?

With BFGS? I think @Vaibhavdixit02 was mentioning something.

This is exactly what I was seeing, this would be with AutoZygote (which is default in sciml_train) and Optim optimizer.

@bkuwahara pass in the AutoForwardDiff argument like DiffEqFlux.sciml_train(loss,pinit,Newton(), GalacticOptim.AutoForwardDiff()) and try or else use ADAM both those should avoid this issue.

Or the best alternative right now would be to use the size comparison like in https://diffeqflux.sciml.ai/dev/examples/divergence/ in your loss function instead of checking the retcode

Thanks @Vaibhavdixit02! Using AutoForwardDiff seems to solve the problem. For some reason, the program still stops after only 20 or so iterations with a loss of about 0.3 (maxiters is set to 100) but at least it doesn’t return an error.

I’ve tried ADAM with this particular problem before but I switched to BFGS because ADAM doesn’t seem to want to converge. Even training with BFGS to a loss on the order of 0.7 and then switching to ADAM, ADAM(0.01) immediately jumps to a loss of ~300 and stays there.

I was using the size size comparison method to handle the error, and the program was still giving me the MethodError message, so I’m still not quite sure what’s causing that.

In any case, I’ll probably stick to using AutoForwardDiff and maybe try out some different network structures to see if they’re more consistently stable. I’m very new to machine learning and Julia as a whole, so I really appreciate the assistance from both of you.

Yeah if you are using the same example I found best result with Newton Update divergence.md by Vaibhavdixit02 · Pull Request #551 · SciML/DiffEqFlux.jl · GitHub you can take a look here for my conclusions

The link is now:

https://docs.sciml.ai/SciMLSensitivity/stable/tutorials/training_tips/divergence/