I am trying to plot the second order derivative of a single input-single output neural network. I can do the first order derivative just fine but the second order derivative computed with the Tracker package is simply 0. But it is not 0 everywhere. Maybe this is a vanishing gradient issue ? To reproduce you can quickly train a similar network with this code:
using Flux, Tracker, CuArrays net = Chain( Dense(1,256, relu), Dense(256,256, relu), Dense(256,1,relu)) |> gpu S = 70 s = 15 function opt(x) if x <= s return S-x else return 0 end end function loss(x,y) ÿ = sum(net(x), dims = 1) Flux.mse(ÿ,y) end op=ADAM() x = hcat(collect(-50f0:0.1f0:100f0)...) y = opt.(x) x = cu(x) y = hcat(y...) |> gpu Flux.train!(loss, Flux.params(net), Iterators.repeated((x,y), 1000), op)
Plot the estimated function and its derivative :
act(x) = sum(net(x)) act(cu) pact(x) = act(cu([x])).data plot(pact,0,20) dact(x) = gradient(act, x; nest = true) dact(cu) pdact(x) = dact(cu[x]).data plot(pdact, 0, 20)
As you can see on the chart, the derivative is mostly flat except for a sharp negative peak around 15. The second order derivative should be a high value in this area but its graph is a plain flat 0:
d2act(x) = gradient((x) -> sum(dact(x)), x; nest = true) #dact returns a 1-element array so I sum over it. d2act(cu) #should be a large value pd2act(x) = d2act(cu[x]).data plot(pd2act, 0, 20)
If you replace the network with a small, untrained sigmoid one, say
Chain(Dense(1,20,relu), Dense(20,1,sigmoid)), then the second order derivative is well plotted.