I am trying to solve a simple differential equation dy/dt=-y using neural networks. The following code runs but the loss remains at 2.0 and I get a output of zero. Am I doing something wrong here?

```
using Flux
using Flux: train!
using Statistics
nn = f64(Chain(Dense(1, 16, relu), Dense(16,16,relu), Dense(16, 1, relu)))
function loss_function(nn,t)
prediction = nn(t)
derivative = gradient(prediction -> sum(prediction), prediction)[1]
initial_condition_loss = (nn([0])[1] - 1)^2
return mean((derivative .+ prediction).^2) + initial_condition_loss
end
ts = 0:0.1:2.0
ts=reshape(ts,(1,:))
opt = Flux.setup(Adam(0.01), nn)
for epoch in 1:1000
train!(loss_function, nn, [(ts,)], opt)
println(loss_function(nn,ts)," ",nn(ts))
end
```

You are not calculating the derivative wrt. t here:

```
derivative = gradient(prediction -> sum(prediction), prediction)
```

In fact, this gradient evaluates to all ones. If anything, you want something like

```
diagonal of jacobian(nn, ts)
```

But then you are nesting AD passes â€“ one over `t`

, one over the parameters. I quickly tried something, but ran into Zygote errors (the infamous â€śmutating arrays errorâ€ť).

Might I suggest a finite difference approach instead:

```
function loss_function(nn, t)
prediction = nn(t)
derivative = diff(prediction, dims=2) ./ diff(t, dims=2)
initial_condition_loss = (nn([0])[1] - 1)^2
return sum((derivative .+ prediction[:, 2:end]) .^ 2) + initial_condition_loss
end
```

**edit: AD seems to work fine; see below.**

This trains well for me with `1=>8=>1`

layers and `tanh`

activations. `relu`

reliably produces zero gradients.

(Of course the SciML ecosystem has all the tools and conveniences for neural ode problems readily available.)

Thanks a lot!!! Apparently, the main issue was the relu activation. The following code gives fantastic results in 10 epochs!

```
using Flux: train!
using Flux
using Statistics
using ForwardDiff
using Zygote
#using ProgressBar
actual(x) = exp(-2.0e0*x)
deltax = 0.1e0
x_train = hcat(collect(Float64, 1:deltax:5)...)
initial = x_train[:,1]
y_train = actual.(x_train)
predict = f64(Chain(Dense(1,8,tanh),
Dense(8,8,tanh),
Dense(8,1,tanh)))
function loss(predict, x, y)
f(x) = sum(predict(x))
grads = collect(Zygote.gradient(f, x)[1])
#println(x)
#println(predict(x))
#println(grads)
ans_bulk = Flux.Losses.mse((-2.0e0) .* predict(x),(grads))
ans_bc = mean(abs2.(predict(initial) .- actual.(initial)))
return(ans_bulk+ans_bc)
end
data = [(x_train,y_train)]
opt = Flux.setup(Adam(0.1), predict) #opt = Adam(0.1)
for epoch in 1:100
train!(loss, predict, data, opt)
println(loss(predict,x_train,y_train))
end
y_predict=predict(x_train)
for i in 1:size(x_train)[2]
println(x_train[1,i]," ",y_train[1,i]," ",y_predict[1,i])
end
```

I am aware of the SciML ecosystem. I do not want to use a black-box. I am trying to understand the basics. Once I am comfortable with my understanding of the technique, I will start using neuralPDE.jl.

1 Like