Flux/Zygote: Gradient with respect to inputs and implicit parameters (in 2021)

manuelbb-upb · November 23, 2021, 9:13am

Hello all,

I have struggled a bit to get the gradient(s) of a loss with a Zygote model both with respect to the model parameters and to the model input.
Is it even sensible to do this in one pass?
Consider this simple setup:

using Flux
layer = Dense(2,3)
layer_params = params( layer )
loss_fn(y_pred, y) = Flux.Losses.mse(y_pred, y)

x = rand(2)	    # current sample
target = ones(3)

Now both calls work as expected:
Taking the gradient with respect to the model parameters

gradient( () -> loss_fn( layer(x), target ),  layer_params )

and with respect to the inputs:

gradient( ( _x ) -> loss_fn( layer( _x ), target ),  x )

But due to the way we take gradients with respect to implicit parameters, I could not get both in one call, e.g., this does not work:

gradient( ( _x ) -> loss_fn( layer( _x ), target ), x, layer_params )

In some old discussion (which sadly I cannot find anymore) I read that you can wrap x as a Flux.Params object. However, there only appears to be a function with signature

gradient( :: Function, :: Params )

so that currently (for multiple Params) I do

input_params = params(x)
ps = union( input_params, layer_params )
gradient( () -> loss_fn( layer( x ), target ), ps )

and this works.

However, I wonder if it is performant, especially if I loop over multiple samples x.
Is there some other way to achieve what I am trying. Or is it just a dumb idea?

Tomas_Pevny · November 23, 2021, 12:28pm

I think that your solution is the only solution at the moment.
I also think that the overhead would be small. In the union, you essentially create a shallow copy of IdDict and that should be pretty fast, in comparison of the price of the gradient.

You can check it out by yourself. Do few iterations where you will just take gradient with respect to parameters (no union) and then of your solutions. The preformance diff will be small.

Topic		Replies	Views
Differentiating implicit parameters using Zygote in complex hierarchical models New to Julia question , differentiation , flux	0	938	January 16, 2019
Why calculating gradients from Params is different than doing it directly? General Usage flux	5	720	April 28, 2020
Calling Flux.params() inside gradient changes output? Machine Learning flux , zygote	2	356	September 28, 2021
Understanding Flux.jl use of `gradient` and `params` Machine Learning flux	4	3537	October 2, 2021
Lux (And Flux), "parallel" Network Input. When Input is flat, Zygote gradient works, when input is not flat it doesn't Machine Learning flux , zygote , lux	10	683	February 5, 2024

Flux/Zygote: Gradient with respect to inputs and implicit parameters (in 2021)

Related topics