Second derivative of custom neural network - gradient error

I am trying to take the second derivative of a custom neural network like this,

icnn = ICNN(1, [32, 32], soft_relu(0.1))
function df(x)
    gradient(y->sum(icnn(y)), x)[1]
end
df([1])
gradient(y->sum(df(y)), [1])

This worked well for the first derivative. However, for the second derivative I get: Can’t differentiate foreigncall expression
Here, icnn is simply some custom neural network implemented as follows,

# soft ReLU
function soft_relu(d)
    x -> max.(clamp.(sign.(x) .* 1/(2*d) .* x.^2, 0, d/2), x .- d/2)
end
################################################################################
# Input Convex Neural Network (ICNN)
struct ICNN
    Ws
    Us
    bs
    act
end
# constructor
ICNN(input_dim::Integer, layer_sizes::Vector, activation) = begin
    Ws = []
    Us = []
    bs = []
    push!(Ws, randn(layer_sizes[1], input_dim))
    push!(bs, randn(layer_sizes[1]))
    i = 1
    for out in layer_sizes[2:end]
        push!(Us, randn(out, input_dim))
        push!(Ws, randn(out, layer_sizes[i]))
        push!(bs, randn(out))
        i += 1
    end
    push!(Us, randn(1, input_dim))
    push!(Ws, randn(1, layer_sizes[end]))
    push!(bs, randn(1))
    ICNN(Ws, Us, bs, activation)
end
# forward pass
(m::ICNN)(x) = begin
    z = m.act(m.Ws[1]*x + m.bs[1])
    for i in 1:length(m.Us)
        z = m.act(m.Ws[i+1]*z + m.Us[i]*x + m.bs[i+1])
    end
    return z
end

Is it possible to take 2nd derivatives and is there any best practice to work with nested gradients?

Thanks in advance for your help!

It appears you are using Zygote. In this case, the most robust method (for me) has been to mix forward and reverse-mode differentiation. This is what Zygote.hessian does under the hood. That may suffice for your problem.

https://fluxml.ai/Zygote.jl/dev/utils/#Zygote.hessian

Thanks a lot for your answer! I am using Flux, but I don’t really understand the difference between Flux and Zygote. Does Flux implicitly use Zygote for differentiation?
I am afraid that Zygote.hessian won’t be enough for what I need. I posted the above code as simple example but I actually need to define a new neural network which involves gradients of the ICNN as defined above. Then I would like to train this new network using gradient descent. The actual function I want to differentiate is,

(m::StableDynamics)(x) = begin
    grad_v = gradient(m.v, x)[1]
    return m.f_hat(x) - grad_v * relu(grad_v'*m.f_hat(x) + 0.9 * v(x)) / (grad_v'*grad_v)
end

Where m.v is a ICNN, m.f_hat is a standard dense neural network. But when I try to differentiate this with respect to x I get an error in the ICNN.

However, the above code is working with Zygote.hessian :slight_smile: Also I can calculate the derivative of StableDynamics() using Zygote.forward_jacobian(). So I guess for training this neural network, I will just have to use forward_jacobian(). Is there a simple reason for this?