Issues with computing gradient with ForwardDiff.jl (Any fixes other than ND?)

When training a model using the boundary loss function, how do you all compute gradients?
We can easily apply numerical differentiaion and get things up and running and I’ve seen a good example here. The Flux.gradient(...) to compute gradients seems very expensive and crashes on my system. I see that NeuralPDE.jl allows the option for AD and calls ForwardDiff.derivative() but it doesn’t seem to be really working on my system. As I mentioned here, it gives wrong gradients for some reason and when I try to train it on an ansatz solution, I get an error. An MWE for the ansatz version is here:

using Flux, ProgressMeter, LinearAlgebra, Plots, ForwardDiff

k= 5.;
f(t)= k*t;

u0= 100;

uNN= Chain(t->[t],
    Dense(1, 15, tanh),
    Dense(15, 25, tanh),
    Dense(25, 15, tanh),

tgrid= sort(rand(100) .*10.);
ϵ= eps(Float64)* 1e10;

# gradients ()

NN_sol(t)= u0+ t*uNN(t)

# dNN_sol(t)= sum(gradient((t)-> NN_sol(t), t)); # this crashes
dNN_sol2(t)= ForwardDiff.derivative((t)-> NN_sol(t), t); # this throws an error
dNN_sol3(t)= (NN_sol(t+ ϵ)- NN_sol(t- ϵ))/ 2ϵ
@show dNN_sol2(0.5) 
@show dNN_sol3(0.5)

loss_ODE()= Flux.mse(f.(tgrid), dNN_sol2.(tgrid));
opt= Adam(0.01);
ps= Flux.params(uNN);

epochs = 5000;
lsi= loss_ODE()
losses= [lsi];

for i in 1:epochs
    tgrid= sort(rand(50) .*10.);
    gs= gradient(loss_ODE, ps);
    Flux.Optimise.update!(opt, ps, gs);
    lsi= loss_ODE();
    push!(losses, lsi)!(p; showvalues = [(:epoch,i),
            (:grad_norm, norm(gs, 2))])

# plot(losses)

xg2= 0:0.1:10;
plot(xg2, NN_sol.(xg2), label= "NN soln", linewidth= 3)
plot!(xg2, k/2 .*xg2.^2 .+ u0, label= "analytical soln", linestyle=:dash, linewidth= 3)

The Stacktrace is as:

ERROR: LoadError: MethodError: no method matching iterate(::Nothing)
Closest candidates are:
  iterate(::Union{LinRange, StepRangeLen}) at range.jl:872
  iterate(::Union{LinRange, StepRangeLen}, ::Integer) at range.jl:872
  iterate(::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}} at dict.jl:712
  [1] isempty(itr::Nothing)
    @ Base ./essentials.jl:788
  [2] norm(itr::Nothing, p::Int64) (repeats 2 times)
    @ LinearAlgebra /Applications/
  [3] MappingRF
    @ ./reduce.jl:95 [inlined]
  [4] _foldl_impl(op::Base.MappingRF{typeof(norm), Base.BottomRF{typeof(max)}}, init::Base._InitialValue, itr::Zygote.Grads)
    @ Base ./reduce.jl:58
  [5] foldl_impl
    @ ./reduce.jl:48 [inlined]
  [6] mapfoldl_impl
    @ ./reduce.jl:44 [inlined]
  [7] #mapfoldl#258
    @ ./reduce.jl:162 [inlined]
  [8] mapfoldl
    @ ./reduce.jl:162 [inlined]
  [9] #mapreduce#262
    @ ./reduce.jl:294 [inlined]
 [10] mapreduce
    @ ./reduce.jl:294 [inlined]
 [11] generic_normInf(x::Zygote.Grads)
    @ LinearAlgebra /Applications/
 [12] normInf
    @ /Applications/ [inlined]
 [13] generic_norm2(x::Zygote.Grads)
    @ LinearAlgebra /Applications/
 [14] norm2
    @ /Applications/ [inlined]
 [15] norm(itr::Zygote.Grads, p::Int64)
    @ LinearAlgebra /Applications/
 [16] top-level scope
    @ ~/Desktop/MT_fwd_surrogate/pinn_ode1.jl:45
in expression starting at /Users/asingh933/Desktop/MT_fwd_surrogate/pinn_ode1.jl:39

Sorry, if this relates a lot to the previous post. This one focuses more on the issues with ForwadDiff.jl. Thanks in advance for any help you can provide!

I’m not that familiar with ForwardDiff but FastDifferentiation can handle the problem as you’ve stated it.

If the size of the problem in your MWE is typical of the actual problems you want to solve then the derivative you get from FastDifferentiation should be both compact and very fast.

If you want to give it a try I’ll help you through the initial learning curve.

Thanks @brianguenter! This looks like an interesting package. The problem I’m trying to solve, however, scales orders compared to the MWE. I might be able to make its use in some initial cases though before as I tackle my problem more and more. While I’m tackling in 1 dimension for now, it will go into 3 dimensions for realistic examples. Thanks again!