How to obtain the gradients of intermediate variables with Flux

AquaIndigo · February 24, 2021, 5:47am

Hi, I would like to get the gradients of the outputs of some hidden layer. For example,y = 5x, z = y / 4 and I want to obtain \frac{\partial z}{\partial y}, but y is the intermediate output of an NN. So how can I do that job in a convenient way?

AquaIndigo · February 24, 2021, 7:27am

Simply speaking, are there ways to implement Grad-CAM via Flux?

ToucheSir · February 24, 2021, 9:51pm

Since gs = gradient(params(model)) returns a collection containing gradients for every parameter, finding the gradient of the penultimate layer is as simple as gs[layer.weight] (i.e. indexing with the actual parameter array).

For Grad-CAM specifically, you can try using activations to grab layer outputs or simply stash them away somewhere (e.g. a local variable) in your loss function. No need to detach.

AquaIndigo · February 24, 2021, 11:48pm

Thanks for your answer, but what I want is to get the gradient of some layer’s output. As far as I’m concerned, gradient (params (model)) gives the gradient of some layer’s parameters, or weights. Since the output is calculated dynamically, I’m afraid I cannot pass them into gradient function.

AquaIndigo · February 25, 2021, 12:03am

And I tried

p2 = params(Flux.activations(m, x))
gs = gradient(p2) do
    loss(x, y)
end

And gs.params were all nothing

ToucheSir · February 25, 2021, 12:11am

Right sorry, it’s been a minute since I last worked with Grad-CAM. Since you only need the gradient wrt. a given layer’s activations, the easiest way to do so would be to compute those first and then compute the loss with them and the rest of the model. e.g:

acts = m_upto_somelayer(x)
grads = gradient(a -> loss(m_after_somelayer(a), y), acts)[1]

Since you’re only looking for the gradient of one parameter here, passing it in explicitly per Basics · Flux is easier than using params.

AquaIndigo · February 25, 2021, 1:02am

Thanks so much, and the method seems enough to me. But furthermore, do you know some general means to get any order mixed partial derivative of many layers’ output? With the method you mentioned, I think we should split the model n times to get the nth order derivative.

ToucheSir · February 25, 2021, 1:20am

You don’t need to split the model at all to get higher order derivatives. That’s usually accomplished using nested AD, e.g. Zygote on top of ForwardDiff. I’m not the best resource on this, so I would recommend searching for previous posts about higher order AD on Discourse and asking on the Slack #autodiff channel if you get stuck.

Now if you want to take the derivative of n separate layers at any order, then yes you’d need to split the model n times to use the method I described above. These splits don’t necessarily need to be stored separately, though. If your model is a Chain, for example, you could do something like model[start_layer_index:end_layer_index] to grab only the parts you want at any given time.

AquaIndigo · February 25, 2021, 2:15am

I truly appreciate your timely help.

AquaIndigo · February 28, 2021, 12:58am

Sorry to bother you again, but I think that I can write the gradient as

ps2 = params(params(tail_layers))
gs = ForwardDiff.gradient(hidden_out) do x
    tmp_gs = Flux.gradient(ps2) do
        return Flux.Losses.logitcrossentropy(tail_layers(x), targ_c)
    end
    return tmp_gs[tmp_gs.params[1]][idc[1]]
end

to get the \frac{\partial\frac{\partial Loss}{\partial W^{(k)}}}{\partial O^{(k-1)}}, where O^{(k-1)} stands for hidden_out and W^{(k)} stands for ps1[1]. However, since the W^{(k-1)} or more previous layers dosen’t occure in the calculation in tail_layer(hidden_out). So could you figure out some means to do that?

lgmendes · March 23, 2022, 1:04pm

Hi AquaIndigo,

I´m new to Julia and Flux. I’m trying to find a Grad-CAM implementation. Can you provide more details related to your implementation?

AquaIndigo · March 24, 2022, 1:24pm

I can’t really remember the details, but there are some previous codes that may help you.

function grad_cam(I::Int, net, lst_conv::Int)
    img, label = CIFAR10.testdata(Float32, i:i)
    inp, targ = img, Flux.onehotbatch(label, 0:9) 
    h_out = net[1:lst_conv](inp)
    m = net[(lst_conv + 1):end]
    ps = params(m)
    
    gcam = ForwardDiff.gradient(h_out) do x
        return m(x)[label[1] + 1]
    end
    return gcam 
end

Topic		Replies	Views
Gradient of Flux model wrt to weights Machine Learning flux	4	1537	May 19, 2021
How to find gradients of weights and bias of multilayer neural network? New to Julia flux	4	654	January 14, 2021
Flux/Zygote: Gradient with respect to inputs and implicit parameters (in 2021) Machine Learning question , flux , zygote	1	975	November 23, 2021
Gradients for recurrent models in Flux.jl General Usage question	0	495	May 8, 2018
Flux loss: Gradient wrt input leads to empty gradient wrt parameters or to "can't differentiate foreigncall" Machine Learning flux , forwarddiff , diffeqflux	3	558	April 8, 2022

How to obtain the gradients of intermediate variables with Flux

Related topics