Second derivative of custom neural network - gradient error

I am trying to take the second derivative of a custom neural network like this,

icnn = ICNN(1, [32, 32], soft_relu(0.1))
function df(x)
    gradient(y->sum(icnn(y)), x)[1]
end
df([1])
gradient(y->sum(df(y)), [1])

This worked well for the first derivative. However, for the second derivative I get: Can’t differentiate foreigncall expression
Here, icnn is simply some custom neural network implemented as follows,

# soft ReLU
function soft_relu(d)
    x -> max.(clamp.(sign.(x) .* 1/(2*d) .* x.^2, 0, d/2), x .- d/2)
end
################################################################################
# Input Convex Neural Network (ICNN)
struct ICNN
    Ws
    Us
    bs
    act
end
# constructor
ICNN(input_dim::Integer, layer_sizes::Vector, activation) = begin
    Ws = []
    Us = []
    bs = []
    push!(Ws, randn(layer_sizes[1], input_dim))
    push!(bs, randn(layer_sizes[1]))
    i = 1
    for out in layer_sizes[2:end]
        push!(Us, randn(out, input_dim))
        push!(Ws, randn(out, layer_sizes[i]))
        push!(bs, randn(out))
        i += 1
    end
    push!(Us, randn(1, input_dim))
    push!(Ws, randn(1, layer_sizes[end]))
    push!(bs, randn(1))
    ICNN(Ws, Us, bs, activation)
end
# forward pass
(m::ICNN)(x) = begin
    z = m.act(m.Ws[1]*x + m.bs[1])
    for i in 1:length(m.Us)
        z = m.act(m.Ws[i+1]*z + m.Us[i]*x + m.bs[i+1])
    end
    return z
end

Is it possible to take 2nd derivatives and is there any best practice to work with nested gradients?

Thanks in advance for your help!

1 Like

It appears you are using Zygote. In this case, the most robust method (for me) has been to mix forward and reverse-mode differentiation. This is what Zygote.hessian does under the hood. That may suffice for your problem.

3 Likes

Thanks a lot for your answer! I am using Flux, but I don’t really understand the difference between Flux and Zygote. Does Flux implicitly use Zygote for differentiation?
I am afraid that Zygote.hessian won’t be enough for what I need. I posted the above code as simple example but I actually need to define a new neural network which involves gradients of the ICNN as defined above. Then I would like to train this new network using gradient descent. The actual function I want to differentiate is,

(m::StableDynamics)(x) = begin
    grad_v = gradient(m.v, x)[1]
    return m.f_hat(x) - grad_v * relu(grad_v'*m.f_hat(x) + 0.9 * v(x)) / (grad_v'*grad_v)
end

Where m.v is a ICNN, m.f_hat is a standard dense neural network. But when I try to differentiate this with respect to x I get an error in the ICNN.

However, the above code is working with Zygote.hessian :slight_smile: Also I can calculate the derivative of StableDynamics() using Zygote.forward_jacobian(). So I guess for training this neural network, I will just have to use forward_jacobian(). Is there a simple reason for this?