DiffEqFlux: neural_ode stops prematurely

diffeq

#1

I am trying to replicate the example in the README of DiffEqFlux https://github.com/JuliaDiffEq/DiffEqFlux.jl. Calling the neural_ode generated function makes Julia exit before training could begin.

The code is

using DifferentialEquations
using Flux, DiffEqFlux

function lotka_volterra(du,u,p,t)
x, y = u
α, β, δ, γ = p
du[1] = dx = αx - βxy
du[2] = dy = -δ
y + γxy
end
u0 = [1.0,1.0]
tspan = (0.0,10.0)
p = [1.5,1.0,3.0,1.0]
prob = ODEProblem(lotka_volterra,u0,tspan,p)
ode_data = Array(solve(prob,Tsit5(),saveat=0.1))

dudt = Chain(Dense(2,50,tanh),Dense(50,2))
tspan = (0.0f0,10.0f0)
n_ode = x->neural_ode(x,dudt,tspan,Tsit5(),saveat=0.1)

function predict_n_ode()
n_ode(u0)
end
loss_n_ode() = sum(abs2,ode_data .- predict_n_ode())

data = Iterators.repeated((), 100)
opt = ADAM(0.1)

cb = function () #callback function to observe training
display(loss_n_ode())
end

println(“Before crashing”)
n_ode(u0)
println(“After crashing”)


#2

That’s the old (yesterday night before we released) syntax. Basically, swap x and dudt:

n_ode = x->neural_ode(dudt,x,tspan,Tsit5(),saveat=0.1)

Where in the docs do we have this? It would be good to fix that.

Edit: Fixed the docs. Thanks for the report!


#3

BTW, I’ll like to see what neural network you come up with to fit Lotka-Volterra. I was running the animations and recording them live on a core i5 laptop, so I kept it to the simple case :slight_smile: . But when I did try to train LV with one hidden layer the NN didn’t seem big enough to capture the function. But on my laptop I couldn’t use the GPUs, so I’m interested to see what kind of NN can be used here :slight_smile: .

(Also, there’s a much better way to train this, but that’s the topic for another publication)


#4

Hi, thanks for the help, it works now. I cannot use a GPU either, simply because I don’t have one. So far my experience is that these networks are difficult to train. I don’t think it is the size or depth of the network. I think it is because of the nature of ODEs. Perturbations are amplified exponentially in time and that is hard to handle with any optimisation. Anyway, I will do some more experimentation before making a judgement.

My strategy would be to train with many short trajectories first and then improve on that with smaller number of longer trajectories. At the moment I have no clue how to do multiple trajectories, my modification of the loss function does not work. If you can give an example with two trajectories, that would be great. Thanks


#5

Yup that’s definitely the case.

That’s multiple shooting. We actually do that in DiffEq-proper: http://docs.juliadiffeq.org/latest/analysis/parameter_estimation.html. We will be putting a paper out on how to loss functions that improve the fitting. What the blog post shows is the training using single shooting which is what the paper shows, but we know that there are better ways :slight_smile:.