Hi,

I am trying to plot the second order derivative of a single input-single output neural network. I can do the first order derivative just fine but the second order derivative computed with the Tracker package is simply 0. But it is not 0 everywhere. Maybe this is a vanishing gradient issue ? To reproduce you can quickly train a similar network with this code:

```
using Flux, Tracker, CuArrays
net = Chain( Dense(1,256, relu),
Dense(256,256, relu),
Dense(256,1,relu)) |> gpu
S = 70
s = 15
function opt(x)
if x <= s
return S-x
else
return 0
end
end
function loss(x,y)
ÿ = sum(net(x), dims = 1)
Flux.mse(ÿ,y)
end
op=ADAM()
x = hcat(collect(-50f0:0.1f0:100f0)...)
y = opt.(x)
x = cu(x)
y = hcat(y...) |> gpu
Flux.train!(loss, Flux.params(net), Iterators.repeated((x,y), 1000), op)
```

Plot the estimated function and its derivative :

```
act(x) = sum(net(x))
act(cu[10])
pact(x) = act(cu([x])).data[1]
plot(pact,0,20)
dact(x) = gradient(act, x; nest = true)[1]
dact(cu[10])
pdact(x) = dact(cu[x]).data[1]
plot(pdact, 0, 20)
```

As you can see on the chart, the derivative is mostly flat except for a sharp negative peak around 15. The second order derivative should be a high value in this area but its graph is a plain flat 0:

```
d2act(x) = gradient((x) -> sum(dact(x)), x; nest = true)[1] #dact returns a 1-element array so I sum over it.
d2act(cu[16]) #should be a large value
pd2act(x) = d2act(cu[x]).data[1]
plot(pd2act, 0, 20)
```

If you replace the network with a small, untrained sigmoid one, say `Chain(Dense(1,20,relu), Dense(20,1,sigmoid))`

, then the second order derivative is well plotted.