Ahh, @SebastianCallh, I got it. In your example, the issue is that there is no ODE that can be the solution. Remember that for ODE solutions, their flow lines can never overlap in phase space. Since you have a 1-dimensional solution, if at a given point the derivative is positive, then it cannot be negative. Remember, this neural ODE is defined as u' = f(u)
, so not all possible functions will exist. So at u=1.35
, the derivative is either positive or negative, and this enforces monotonicity so that time series is not able to be fit.
This suggests the correct strategies though. One thing you can do is use u'=f(u,t)
, but this has the downside of being time dependent and thus won’t predict well. The right way to do this is to use Augmented Neural ODEs. See the tutorial in the documentation for information on how to use it:
https://diffeqflux.sciml.ai/dev/examples/augmented_neural_ode/
Here I show how I would use augmented neural ODEs on your example:
using DiffEqFlux, OrdinaryDiffEq, Flux, Optim, Plots
function node_model(t)
dudt2 = FastChain(FastDense(2, 100, swish),
FastDense(100, 2))
node = AugmentedNDELayer(NeuralODE(dudt2, (minimum(t), maximum(t)),
Tsit5(), saveat = t,
abstol = 1e-12, reltol = 1e-12), 1)
end
function plot_pred(t, u, û)
plt = scatter(reshape(t,:), reshape(u,:),
label = "Data")
scatter!(plt, reshape(t,:), reshape(û,:),
label = "Prediction")
end
function train(node, t, u, p = nothing; maxiters, optimizer, kwargs...)
u0 = u[1:1]
predict(p) = Array(node(u0, p))
loss(p) = begin
û = predict(p)[1,:]
l = sum(abs2,vec(û)-vec(u))
return l, û
end
losses = []
cb(p, l, û) = begin
@show l
push!(losses, l)
false
end
p = p == nothing ? node.p : p
res = DiffEqFlux.sciml_train(loss, p, optimizer;
maxiters = maxiters,
cb = cb,
kwargs...)
p_pred = plot_pred(t, u, predict(res.minimizer)[1,:])
plot!(p_pred, legend = (.11, .9))
p_loss = plot(losses, label="Loss")
display(plot(p_pred, p_loss, layout=(2, 1)))
return res.minimizer
end
# the ten first observations after normalization
t = [
0.0
0.0047485508630894695
0.009340678874206868
0.018675771218000677
0.02341873555066439
0.028161699883315395
0.03750237875753497
0.042094506768665066
0.04683747110131607
0.051580435433979784
]
u = [
-1.388342887543077
-1.3271350629309493
-1.3250365089442477
-1.3827467435785388
-1.4152743303724125
-1.4754328779911903
-1.4712357700177872
-1.4243680643147854
-1.3925399955164801
-1.3607119267181729
]
# train on the k first observations
k = 3
node = node_model(t[1:k])
p = train(node, t[1:k], u[1:k],
maxiters = 10, optimizer = ADAM(.1));
p2 = train(node, t[1:k], u[1:k], p,
maxiters = 1000, optimizer = BFGS(initial_stepnorm = 0.01));
k = 4
node = node_model(t[1:k])
p3 = train(node, t[1:k], u[1:k], p2,
maxiters = 1000, optimizer = BFGS(initial_stepnorm = 0.01));
with a resulting loss:
l = 2.6409047003451676e-8
l = 2.2702601565538227e-9
l = 1.3131485517490487e-9
l = 6.922936097395577e-12
l = 1.1102737720276462e-13
l = 4.0216421072999674e-17
that is essentially zero.
Hope this makes a lot more sense now!