Higher order derivatives/ automatic differentiation

ayushinav · December 23, 2023, 9:08am

Thanks @tomaklutfu, I realize that the \lambda's I’ve initialised are not the best ones and it does zero out everything. However, I tried it with other values as well and as much as I do not claim about finding the right balance, somehow the ansatz solution u(t=0)+ (t-0) u_{NN}(t) also does not work very well. I had looked at the the thread you mentioned prior to posting my question and could not find much help with that. My bigger concern, however, was that even with the first order derivative, ForwardDiff.derivative() is messing up with the gradients which somewhat goes beyond my understanding. Here’s a MWE for a simpler version where I try to solve for u'(t)= -kt with u(0)= 100:

using Flux, Plots, BenchmarkTools, ProgressMeter
k= 2.;
f(t)= -k*t;

u0= 100;

u_NN= Chain(t->[t],
    Dense(1, 15, tanh),
    Dense(15, 25, tanh),
    Dense(25, 15, tanh),
    Dense(15,1),
    first
)

tgrid= sort(rand(100) .*10.);
ϵ= eps(Float64)* 1e10;

# gradients ()

du_NN2(t)= ForwardDiff.derivative((t)-> u_NN(t), t);
du_NN(t)= sum(gradient((t)-> u_NN(t), t)); # found somewhere about using sum instead of to circumvent the get_index() problem
du_NN3(t)= (u_NN(t+ ϵ)- u_NN(t- ϵ))/ 2ϵ
@show du_NN2(0.5) 
@show du_NN3(0.5)

loss_ODE()= Flux.mse(f.(tgrid), du_NN3.(tgrid));
loss_BC()= abs2(u_NN(0.)- u0)

loss()= loss_ODE()+ 0.001* loss_BC();

opt= Adam(0.01);
ps= Flux.params(u_NN)

epochs = 5000;
lsi= loss()
losses= [lsi];
ProgressMeter.ijulia_behavior(:clear)
p=Progress(epochs)
for i in 1:epochs
    tgrid= sort(rand(50) .*10.);
    gs= gradient(loss, ps);
    Flux.Optimise.update!(opt, ps, gs);
    lsi= loss();
    push!(losses, lsi)
    ProgressMeter.next!(p; showvalues = [(:epoch,i),
            (:train_loss,lsi), 
            (:grad_norm, norm(gs, 2))])
end

# plot(losses)

xg2= 0:0.01:10;
plot(xg2, u_NN.(xg2), label= "NN soln")
plot!(xg2, -λ/2 .*xg2.^2 .+ u0, label= "analytical soln")

This does not converge well when I use the ForwardDiff gradient
loss_ODE()= Flux.mse(f.(tgrid), du_NN2.(tgrid));
The gradients computed by numerical diff and ForwardDiff come out pretty close but the latter somehow messes with Flux gradients required for training the NN. One can easily collect the norm of gradients from the two gradient implementations and see this is the case.

Topic		Replies	Views
Is it possible to do Nested AD ~elegantly~ in Julia? (PINNs) General Usage machine-learning	43	3217	September 27, 2024
Issues with computing gradient with ForwardDiff.jl (Any fixes other than ND?) Machine Learning forwarddiff , neural-network	2	213	December 24, 2023
What is the best way to implement PINN in Julia New to Julia pde , neural-network , autodiff	18	557	April 6, 2025
Flux + Automatic Differentiation Machine Learning	3	579	January 19, 2023
Flux differentiation error Machine Learning zygote	19	1684	November 19, 2020

Higher order derivatives/ automatic differentiation

Related topics