I wouldn’t use ReverseDiff or inplace here. Another mistake that you made was that, notice that the solution is a row vector for each output component, so you were broadcasting a row vector against a column vector in the loss function to generate a matrix. That’s not what you wanted.
Here’s a code that trains:
using DifferentialEquations, Flux, Optim, DiffEqFlux, DiffEqSensitivity, Plots
tspan = (0.1f0, Float32(10.0))
tsteps = range(tspan[1], tspan[2], length = 100)
t_vec = collect(tsteps)
ex = vec(ones(Float32,length(tsteps), 1))
f(x) = (atan(8.0 * x - 4.0) + atan(4.0)) / (2.0 * atan(4.0))
function hammerstein_system(u)
y= zeros(size(u))
for k in 2:length(u)
y[k] = 0.2 * f(u[k-1]) + 0.8 * y[k-1]
end
return y
end
y = Float32.(hammerstein_system(ex))
plot(collect(tsteps), y, ticks=:native)
nn_model = FastChain(FastDense(2,8, tanh), FastDense(8, 1))
p_model = initial_params(nn_model)
u0 = Float32.([0.0])
function dudt(u, p, t)
#input_val = u_vals[Int(round(t*10)+1)]
nn_model(vcat(u[1], ex[Int(round(10*0.1))]), p)
end
prob = ODEProblem(dudt,u0,tspan,nothing)
function predict_neuralode(p)
_prob = remake(prob,p=p)
Array(solve(_prob, Tsit5(), saveat=tsteps, abstol = 1e-8, reltol = 1e-6))
end
function loss(p)
sol = predict_neuralode(p)
N = length(sol)
return sum(abs2.(y[1:N] .- sol'))/N
end
# start optimization (I played around with several different optimizers with no success)
res0 = DiffEqFlux.sciml_train(loss,p_model ,ADAM(0.01), maxiters=100)
res1 = DiffEqFlux.sciml_train(loss,res0.minimizer,BFGS(initial_stepnorm=0.01), maxiters=300, allow_f_increases = true)
Flux.gradient(loss,res1.minimizer)
sol = predict_neuralode(res1.minimizer)
plot(tsteps,sol')
N = length(sol)
scatter!(tsteps,y[1:N])
savefig("trained.png")
Nope, I don’t think it’s too uncommon. Would you mind if I added this as an example to the documentation?
If you don’t have very much data but you have a model then this could be a way to get a better predicting method since the neural network will likely overfit and attempt to over-generalize. There’s also a ton of other reasons, like how discretized physics-informed neural networks are a subset of UDEs and how high dimensional PDE solvers can be written as a UDE. See https://arxiv.org/abs/2001.04385 for all of the details.