How to get second order gradient of a neural network?

Hi,

I am trying to plot the second order derivative of a single input-single output neural network. I can do the first order derivative just fine but the second order derivative computed with the Tracker package is simply 0. But it is not 0 everywhere. Maybe this is a vanishing gradient issue ? To reproduce you can quickly train a similar network with this code:

using Flux, Tracker, CuArrays

net = Chain( Dense(1,256, relu),              
             Dense(256,256, relu),
             Dense(256,1,relu)) |> gpu

S = 70
s = 15
function opt(x)
   if x <= s
       return S-x
   else
       return 0
   end
end

function loss(x,y)
    ÿ = sum(net(x), dims = 1)
    Flux.mse(ÿ,y)
end

op=ADAM()

x = hcat(collect(-50f0:0.1f0:100f0)...)
y = opt.(x)
x = cu(x)
y = hcat(y...) |> gpu
Flux.train!(loss, Flux.params(net), Iterators.repeated((x,y), 1000), op)

Plot the estimated function and its derivative :

act(x) = sum(net(x))
act(cu[10])
pact(x) = act(cu([x])).data[1]
plot(pact,0,20)

dact(x) = gradient(act, x; nest = true)[1]
dact(cu[10])
pdact(x) = dact(cu[x]).data[1]
plot(pdact, 0, 20)

As you can see on the chart, the derivative is mostly flat except for a sharp negative peak around 15. The second order derivative should be a high value in this area but its graph is a plain flat 0:

d2act(x) = gradient((x) -> sum(dact(x)), x; nest = true)[1] #dact returns a 1-element array so I sum over it.
d2act(cu[16]) #should be a large value
pd2act(x) = d2act(cu[x]).data[1]
plot(pd2act, 0, 20)

If you replace the network with a small, untrained sigmoid one, say Chain(Dense(1,20,relu), Dense(20,1,sigmoid)), then the second order derivative is well plotted.

1 Like

Nevermind, I guess since a neural network with relu activation functions is a piecewise-linear approximation with an exponential number of pieces it makes sense that the second order derivative is 0 everywhere.

1 Like