Unable to solve a differential equation using neural network

I am trying to solve a simple differential equation dy/dt=-y using neural networks. The following code runs but the loss remains at 2.0 and I get a output of zero. Am I doing something wrong here?

using Flux
using Flux: train!
using Statistics
nn = f64(Chain(Dense(1, 16, relu), Dense(16,16,relu), Dense(16, 1, relu)))

function loss_function(nn,t)
    prediction = nn(t)
    derivative = gradient(prediction -> sum(prediction), prediction)[1]
    initial_condition_loss = (nn([0])[1] - 1)^2
    return mean((derivative .+ prediction).^2) + initial_condition_loss
end

ts = 0:0.1:2.0
ts=reshape(ts,(1,:))
opt = Flux.setup(Adam(0.01), nn)

for epoch in 1:1000
    train!(loss_function, nn, [(ts,)], opt)
    println(loss_function(nn,ts)," ",nn(ts))
end

You are not calculating the derivative wrt. t here:

derivative = gradient(prediction -> sum(prediction), prediction)

In fact, this gradient evaluates to all ones. If anything, you want something like

diagonal of jacobian(nn, ts)

But then you are nesting AD passes – one over t, one over the parameters. I quickly tried something, but ran into Zygote errors (the infamous “mutating arrays error”).

Might I suggest a finite difference approach instead:

function loss_function(nn, t)
    prediction = nn(t)
    derivative = diff(prediction, dims=2) ./ diff(t, dims=2)
    initial_condition_loss = (nn([0])[1] - 1)^2
    return sum((derivative .+ prediction[:, 2:end]) .^ 2) + initial_condition_loss
end

edit: AD seems to work fine; see below.

This trains well for me with 1=>8=>1 layers and tanh activations. relu reliably produces zero gradients.

(Of course the SciML ecosystem has all the tools and conveniences for neural ode problems readily available.)

Thanks a lot!!! Apparently, the main issue was the relu activation. The following code gives fantastic results in 10 epochs!

using Flux: train!
using  Flux
using Statistics
using ForwardDiff
using Zygote
#using ProgressBar
actual(x) = exp(-2.0e0*x)
deltax = 0.1e0

x_train = hcat(collect(Float64, 1:deltax:5)...)
initial = x_train[:,1]
y_train = actual.(x_train)
predict = f64(Chain(Dense(1,8,tanh),
                Dense(8,8,tanh),
                Dense(8,1,tanh)))
function loss(predict, x, y)
    f(x) = sum(predict(x))
    grads = collect(Zygote.gradient(f, x)[1])
    #println(x)
    #println(predict(x))
    #println(grads)
    ans_bulk = Flux.Losses.mse((-2.0e0) .* predict(x),(grads))
    ans_bc = mean(abs2.(predict(initial) .- actual.(initial)))
    return(ans_bulk+ans_bc)
end
data = [(x_train,y_train)]
opt = Flux.setup(Adam(0.1), predict) #opt = Adam(0.1)
for epoch in 1:100
    train!(loss, predict, data, opt)
    println(loss(predict,x_train,y_train))
end
y_predict=predict(x_train)
for i in 1:size(x_train)[2]
    println(x_train[1,i]," ",y_train[1,i]," ",y_predict[1,i])
end

I am aware of the SciML ecosystem. I do not want to use a black-box. I am trying to understand the basics. Once I am comfortable with my understanding of the technique, I will start using neuralPDE.jl.

1 Like