How to get second order gradient of a neural network?

HenriDeh · July 22, 2019, 2:02pm

Hi,

I am trying to plot the second order derivative of a single input-single output neural network. I can do the first order derivative just fine but the second order derivative computed with the Tracker package is simply 0. But it is not 0 everywhere. Maybe this is a vanishing gradient issue ? To reproduce you can quickly train a similar network with this code:

using Flux, Tracker, CuArrays

net = Chain( Dense(1,256, relu),              
             Dense(256,256, relu),
             Dense(256,1,relu)) |> gpu

S = 70
s = 15
function opt(x)
   if x <= s
       return S-x
   else
       return 0
   end
end

function loss(x,y)
    ÿ = sum(net(x), dims = 1)
    Flux.mse(ÿ,y)
end

op=ADAM()

x = hcat(collect(-50f0:0.1f0:100f0)...)
y = opt.(x)
x = cu(x)
y = hcat(y...) |> gpu
Flux.train!(loss, Flux.params(net), Iterators.repeated((x,y), 1000), op)

Plot the estimated function and its derivative :

act(x) = sum(net(x))
act(cu[10])
pact(x) = act(cu([x])).data[1]
plot(pact,0,20)

dact(x) = gradient(act, x; nest = true)[1]
dact(cu[10])
pdact(x) = dact(cu[x]).data[1]
plot(pdact, 0, 20)

As you can see on the chart, the derivative is mostly flat except for a sharp negative peak around 15. The second order derivative should be a high value in this area but its graph is a plain flat 0:

d2act(x) = gradient((x) -> sum(dact(x)), x; nest = true)[1] #dact returns a 1-element array so I sum over it.
d2act(cu[16]) #should be a large value
pd2act(x) = d2act(cu[x]).data[1]
plot(pd2act, 0, 20)

If you replace the network with a small, untrained sigmoid one, say Chain(Dense(1,20,relu), Dense(20,1,sigmoid)), then the second order derivative is well plotted.

HenriDeh · July 23, 2019, 12:53pm

Nevermind, I guess since a neural network with relu activation functions is a piecewise-linear approximation with an exponential number of pieces it makes sense that the second order derivative is 0 everywhere.

Topic		Replies	Views
Efficient Way of Taking First and Second Order Derivatives of a Neural Net Machine Learning	1	538	July 2, 2020
Second order gradient with Lux, Zygote, CUDA, Enzyme Machine Learning	12	336	January 2, 2025
Unable to solve a differential equation using neural network General Usage flux , neural-network	2	212	September 10, 2023
Autograd in Flux General Usage	14	1282	March 5, 2021
Loss functions that involve gradients Machine Learning	4	605	November 22, 2022

How to get second order gradient of a neural network?

Related topics