# Second derivative of custom neural network - gradient error

I am trying to take the second derivative of a custom neural network like this,

``````icnn = ICNN(1, [32, 32], soft_relu(0.1))
function df(x)
end
df([1])
``````

This worked well for the first derivative. However, for the second derivative I get: Can’t differentiate foreigncall expression
Here, icnn is simply some custom neural network implemented as follows,

``````# soft ReLU
function soft_relu(d)
x -> max.(clamp.(sign.(x) .* 1/(2*d) .* x.^2, 0, d/2), x .- d/2)
end
################################################################################
# Input Convex Neural Network (ICNN)
struct ICNN
Ws
Us
bs
act
end
# constructor
ICNN(input_dim::Integer, layer_sizes::Vector, activation) = begin
Ws = []
Us = []
bs = []
push!(Ws, randn(layer_sizes[1], input_dim))
push!(bs, randn(layer_sizes[1]))
i = 1
for out in layer_sizes[2:end]
push!(Us, randn(out, input_dim))
push!(Ws, randn(out, layer_sizes[i]))
push!(bs, randn(out))
i += 1
end
push!(Us, randn(1, input_dim))
push!(Ws, randn(1, layer_sizes[end]))
push!(bs, randn(1))
ICNN(Ws, Us, bs, activation)
end
# forward pass
(m::ICNN)(x) = begin
z = m.act(m.Ws[1]*x + m.bs[1])
for i in 1:length(m.Us)
z = m.act(m.Ws[i+1]*z + m.Us[i]*x + m.bs[i+1])
end
return z
end
``````

Is it possible to take 2nd derivatives and is there any best practice to work with nested gradients?

2 Likes

It appears you are using Zygote. In this case, the most robust method (for me) has been to mix forward and reverse-mode differentiation. This is what `Zygote.hessian` does under the hood. That may suffice for your problem.

4 Likes

Thanks a lot for your answer! I am using Flux, but I don’t really understand the difference between Flux and Zygote. Does Flux implicitly use Zygote for differentiation?
I am afraid that Zygote.hessian won’t be enough for what I need. I posted the above code as simple example but I actually need to define a new neural network which involves gradients of the ICNN as defined above. Then I would like to train this new network using gradient descent. The actual function I want to differentiate is,

``````(m::StableDynamics)(x) = begin
end
``````

Where m.v is a ICNN, m.f_hat is a standard dense neural network. But when I try to differentiate this with respect to x I get an error in the ICNN.

However, the above code is working with Zygote.hessian Also I can calculate the derivative of StableDynamics() using Zygote.forward_jacobian(). So I guess for training this neural network, I will just have to use forward_jacobian(). Is there a simple reason for this?