Issues with computing gradient with ForwardDiff.jl (Any fixes other than ND?)

ayushinav · December 23, 2023, 10:46pm

When training a model using the boundary loss function, how do you all compute gradients?
We can easily apply numerical differentiaion and get things up and running and I’ve seen a good example here. The Flux.gradient(...) to compute gradients seems very expensive and crashes on my system. I see that NeuralPDE.jl allows the option for AD and calls ForwardDiff.derivative() but it doesn’t seem to be really working on my system. As I mentioned here, it gives wrong gradients for some reason and when I try to train it on an ansatz solution, I get an error. An MWE for the ansatz version is here:

using Flux, ProgressMeter, LinearAlgebra, Plots, ForwardDiff

k= 5.;
f(t)= k*t;

u0= 100;

uNN= Chain(t->[t],
    Dense(1, 15, tanh),
    Dense(15, 25, tanh),
    Dense(25, 15, tanh),
    Dense(15,1),
    first
)

tgrid= sort(rand(100) .*10.);
ϵ= eps(Float64)* 1e10;

# gradients ()

NN_sol(t)= u0+ t*uNN(t)

# dNN_sol(t)= sum(gradient((t)-> NN_sol(t), t)); # this crashes
dNN_sol2(t)= ForwardDiff.derivative((t)-> NN_sol(t), t); # this throws an error
dNN_sol3(t)= (NN_sol(t+ ϵ)- NN_sol(t- ϵ))/ 2ϵ
@show dNN_sol2(0.5) 
@show dNN_sol3(0.5)

loss_ODE()= Flux.mse(f.(tgrid), dNN_sol2.(tgrid));
opt= Adam(0.01);
ps= Flux.params(uNN);

epochs = 5000;
lsi= loss_ODE()
losses= [lsi];

ProgressMeter.ijulia_behavior(:clear)
p=Progress(epochs)
for i in 1:epochs
    tgrid= sort(rand(50) .*10.);
    gs= gradient(loss_ODE, ps);
    Flux.Optimise.update!(opt, ps, gs);
    lsi= loss_ODE();
    push!(losses, lsi)
    ProgressMeter.next!(p; showvalues = [(:epoch,i),
            (:train_loss,lsi), 
            (:grad_norm, norm(gs, 2))])
end

# plot(losses)

xg2= 0:0.1:10;
plot(xg2, NN_sol.(xg2), label= "NN soln", linewidth= 3)
plot!(xg2, k/2 .*xg2.^2 .+ u0, label= "analytical soln", linestyle=:dash, linewidth= 3)

The Stacktrace is as:

ERROR: LoadError: MethodError: no method matching iterate(::Nothing)
Closest candidates are:
  iterate(::Union{LinRange, StepRangeLen}) at range.jl:872
  iterate(::Union{LinRange, StepRangeLen}, ::Integer) at range.jl:872
  iterate(::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}} at dict.jl:712
  ...
Stacktrace:
  [1] isempty(itr::Nothing)
    @ Base ./essentials.jl:788
  [2] norm(itr::Nothing, p::Int64) (repeats 2 times)
    @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/generic.jl:591
  [3] MappingRF
    @ ./reduce.jl:95 [inlined]
  [4] _foldl_impl(op::Base.MappingRF{typeof(norm), Base.BottomRF{typeof(max)}}, init::Base._InitialValue, itr::Zygote.Grads)
    @ Base ./reduce.jl:58
  [5] foldl_impl
    @ ./reduce.jl:48 [inlined]
  [6] mapfoldl_impl
    @ ./reduce.jl:44 [inlined]
  [7] #mapfoldl#258
    @ ./reduce.jl:162 [inlined]
  [8] mapfoldl
    @ ./reduce.jl:162 [inlined]
  [9] #mapreduce#262
    @ ./reduce.jl:294 [inlined]
 [10] mapreduce
    @ ./reduce.jl:294 [inlined]
 [11] generic_normInf(x::Zygote.Grads)
    @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/generic.jl:453
 [12] normInf
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/generic.jl:522 [inlined]
 [13] generic_norm2(x::Zygote.Grads)
    @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/generic.jl:463
 [14] norm2
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/generic.jl:524 [inlined]
 [15] norm(itr::Zygote.Grads, p::Int64)
    @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/generic.jl:593
 [16] top-level scope
    @ ~/Desktop/MT_fwd_surrogate/pinn_ode1.jl:45
in expression starting at /Users/asingh933/Desktop/MT_fwd_surrogate/pinn_ode1.jl:39

Sorry, if this relates a lot to the previous post. This one focuses more on the issues with ForwadDiff.jl. Thanks in advance for any help you can provide!

brianguenter · December 24, 2023, 12:21am

I’m not that familiar with ForwardDiff but FastDifferentiation can handle the problem as you’ve stated it.

If the size of the problem in your MWE is typical of the actual problems you want to solve then the derivative you get from FastDifferentiation should be both compact and very fast.

If you want to give it a try I’ll help you through the initial learning curve.

ayushinav · December 24, 2023, 4:30am

Thanks @brianguenter! This looks like an interesting package. The problem I’m trying to solve, however, scales orders compared to the MWE. I might be able to make its use in some initial cases though before as I tackle my problem more and more. While I’m tackling in 1 dimension for now, it will go into 3 dimensions for realistic examples. Thanks again!

Topic		Replies	Views
Gradient error in Flux model inputs Machine Learning question , flux , zygote	5	1324	January 13, 2021
Applying Reverse-Mode AD to ForwardDiff Derivatives/Gradients New to Julia flux , forwarddiff , chainrulescore	0	368	May 3, 2022
Loss functions that involve gradients Machine Learning	4	606	November 22, 2022
Is it possible perform reverse mode differentiation (Flux.jl with Zygote.jl) of a forward mode differentiation result (e.g. ForwardDiff)? Machine Learning question , flux	3	1446	March 10, 2020
Getting gradients with loss using for-loop is slow in Flux.jl Machine Learning	4	278	August 7, 2023

Issues with computing gradient with ForwardDiff.jl (Any fixes other than ND?)

Related topics