Second derivative of custom neural network - gradient error

Andreas_Schlaginhauf · October 30, 2020, 3:51pm

I am trying to take the second derivative of a custom neural network like this,

icnn = ICNN(1, [32, 32], soft_relu(0.1))
function df(x)
    gradient(y->sum(icnn(y)), x)[1]
end
df([1])
gradient(y->sum(df(y)), [1])

This worked well for the first derivative. However, for the second derivative I get: Can’t differentiate foreigncall expression
Here, icnn is simply some custom neural network implemented as follows,

# soft ReLU
function soft_relu(d)
    x -> max.(clamp.(sign.(x) .* 1/(2*d) .* x.^2, 0, d/2), x .- d/2)
end
################################################################################
# Input Convex Neural Network (ICNN)
struct ICNN
    Ws
    Us
    bs
    act
end
# constructor
ICNN(input_dim::Integer, layer_sizes::Vector, activation) = begin
    Ws = []
    Us = []
    bs = []
    push!(Ws, randn(layer_sizes[1], input_dim))
    push!(bs, randn(layer_sizes[1]))
    i = 1
    for out in layer_sizes[2:end]
        push!(Us, randn(out, input_dim))
        push!(Ws, randn(out, layer_sizes[i]))
        push!(bs, randn(out))
        i += 1
    end
    push!(Us, randn(1, input_dim))
    push!(Ws, randn(1, layer_sizes[end]))
    push!(bs, randn(1))
    ICNN(Ws, Us, bs, activation)
end
# forward pass
(m::ICNN)(x) = begin
    z = m.act(m.Ws[1]*x + m.bs[1])
    for i in 1:length(m.Us)
        z = m.act(m.Ws[i+1]*z + m.Us[i]*x + m.bs[i+1])
    end
    return z
end

Is it possible to take 2nd derivatives and is there any best practice to work with nested gradients?

Thanks in advance for your help!

platawiec · October 30, 2020, 4:48pm

It appears you are using Zygote. In this case, the most robust method (for me) has been to mix forward and reverse-mode differentiation. This is what Zygote.hessian does under the hood. That may suffice for your problem.

https://fluxml.ai/Zygote.jl/dev/utils/#Zygote.hessian

Andreas_Schlaginhauf · October 31, 2020, 11:32am

Thanks a lot for your answer! I am using Flux, but I don’t really understand the difference between Flux and Zygote. Does Flux implicitly use Zygote for differentiation?
I am afraid that Zygote.hessian won’t be enough for what I need. I posted the above code as simple example but I actually need to define a new neural network which involves gradients of the ICNN as defined above. Then I would like to train this new network using gradient descent. The actual function I want to differentiate is,

(m::StableDynamics)(x) = begin
    grad_v = gradient(m.v, x)[1]
    return m.f_hat(x) - grad_v * relu(grad_v'*m.f_hat(x) + 0.9 * v(x)) / (grad_v'*grad_v)
end

Where m.v is a ICNN, m.f_hat is a standard dense neural network. But when I try to differentiate this with respect to x I get an error in the ICNN.

Andreas_Schlaginhauf · October 31, 2020, 6:10pm

However, the above code is working with Zygote.hessian Also I can calculate the derivative of StableDynamics() using Zygote.forward_jacobian(). So I guess for training this neural network, I will just have to use forward_jacobian(). Is there a simple reason for this?

Topic		Replies	Views
Second derivative in Zygote returns nothing Machine Learning zygote , neural-network	1	286	May 31, 2023
Flux + Automatic Differentiation Machine Learning	3	579	January 19, 2023
Compute gradient of gradient norm using zygote New to Julia zygote	17	2028	August 26, 2022
Speeding up gradients for custom neural network - currently much slower than in PyTorch Machine Learning performance , differentiation	16	2120	August 28, 2021
How to use gradient of neural network as the loss function? Machine Learning question	13	2739	March 23, 2021

Second derivative of custom neural network - gradient error

Related topics