To understand neural ODE

I find I still don’t fully understand neural ordinary differential equations.

As shown in literature (Lu et al., 2017; Haber and Ruthotto, 2017; Chen el al., 2018), a sequence of
transformations
h(t+1) = h(t) + f(h,theta)
can be turned into an ODE in the limiting case
dh/dt = f(h,theta)

This formulation holds only under the condition that theta is a function of time.
This way we can say we’ve got an equivalent block of infinite depth.
However, in applications we just define a network with a fix number of parameters.
For example, in the following codes,

using DiffEqFlux, OrdinaryDiffEq
dudt = FastChain(FastDense(2, 50, tanh), FastDense(50, 2))
u0 = Float32[2.0; 0.0]
tspan = (0.f0, 1.f0)
nn = NeuralODE(dudt, tspan, Tsit5())

nn.p is a fix-sized vector with 252 entries.
Although the output of NN is feed into next iteration as input in the Tsit5 solver, the old parameters are still there.
Does the neural ODE hold better learning capacity than a single layer, or they are just the same in this case?

It’s n layers, where n is determined adaptively by the neural network.

Could you elaborate a bit on the adaptivity?
To me it seems everything is fixed after nn is defined in my demo code.

It defines and ODE which is solved by an adaptive ODE solver. Depending on what steps the ODE solver decides to take, that’s how many layers you effectively have.

Yeah now I got what you mean. In fact this is exactly what I’m asking.
Sure with different ODE solvers we can have different number of steps, or say “n layers”, but all these layers rely on the parameters of dudt = FastChain(FastDense(2, 50, tanh), FastDense(50, 2)).
Are they really as good as a neural network that has n real layers (with of course the geometrically increasing number of parameters), e.g. FastChain(dudt, dudt, ...)?

It’s then similar to recurrent models.

1 Like