How to obtain the gradients of intermediate variables with Flux

Hi, I would like to get the gradients of the outputs of some hidden layer. For example,y = 5x, z = y / 4 and I want to obtain \frac{\partial z}{\partial y}, but y is the intermediate output of an NN. So how can I do that job in a convenient way?

Simply speaking, are there ways to implement Grad-CAM via Flux?

Since gs = gradient(params(model)) returns a collection containing gradients for every parameter, finding the gradient of the penultimate layer is as simple as gs[layer.weight] (i.e. indexing with the actual parameter array).

For Grad-CAM specifically, you can try using activations to grab layer outputs or simply stash them away somewhere (e.g. a local variable) in your loss function. No need to detach.

Thanks for your answer, but what I want is to get the gradient of some layer’s output. As far as I’m concerned, gradient (params (model)) gives the gradient of some layer’s parameters, or weights. Since the output is calculated dynamically, I’m afraid I cannot pass them into gradient function.

And I tried

p2 = params(Flux.activations(m, x))
gs = gradient(p2) do
    loss(x, y)

And gs.params were all nothing

Right sorry, it’s been a minute since I last worked with Grad-CAM. Since you only need the gradient wrt. a given layer’s activations, the easiest way to do so would be to compute those first and then compute the loss with them and the rest of the model. e.g:

acts = m_upto_somelayer(x)
grads = gradient(a -> loss(m_after_somelayer(a), y), acts)[1]

Since you’re only looking for the gradient of one parameter here, passing it in explicitly per Basics · Flux is easier than using params.

Thanks so much, and the method seems enough to me. But furthermore, do you know some general means to get any order mixed partial derivative of many layers’ output? With the method you mentioned, I think we should split the model n times to get the nth order derivative.

You don’t need to split the model at all to get higher order derivatives. That’s usually accomplished using nested AD, e.g. Zygote on top of ForwardDiff. I’m not the best resource on this, so I would recommend searching for previous posts about higher order AD on Discourse and asking on the Slack #autodiff channel if you get stuck.

Now if you want to take the derivative of n separate layers at any order, then yes you’d need to split the model n times to use the method I described above. These splits don’t necessarily need to be stored separately, though. If your model is a Chain, for example, you could do something like model[start_layer_index:end_layer_index] to grab only the parts you want at any given time.

I truly appreciate your timely help.

1 Like

Sorry to bother you again, but I think that I can write the gradient as

ps2 = params(params(tail_layers))
gs = ForwardDiff.gradient(hidden_out) do x
    tmp_gs = Flux.gradient(ps2) do
        return Flux.Losses.logitcrossentropy(tail_layers(x), targ_c)
    return tmp_gs[tmp_gs.params[1]][idc[1]]

to get the \frac{\partial\frac{\partial Loss}{\partial W^{(k)}}}{\partial O^{(k-1)}}, where O^{(k-1)} stands for hidden_out and W^{(k)} stands for ps1[1]. However, since the W^{(k-1)} or more previous layers dosen’t occure in the calculation in tail_layer(hidden_out). So could you figure out some means to do that?

Hi AquaIndigo,

I´m new to Julia and Flux. I’m trying to find a Grad-CAM implementation. Can you provide more details related to your implementation?

I can’t really remember the details, but there are some previous codes that may help you.

function grad_cam(I::Int, net, lst_conv::Int)
    img, label = CIFAR10.testdata(Float32, i:i)
    inp, targ = img, Flux.onehotbatch(label, 0:9) 
    h_out = net[1:lst_conv](inp)
    m = net[(lst_conv + 1):end]
    ps = params(m)
    gcam = ForwardDiff.gradient(h_out) do x
        return m(x)[label[1] + 1]
    return gcam