Gradient of Flux model wrt to weights

edwinb-ai · May 19, 2021, 12:52am

Hello everyone.

I’m attempting to obtain the gradient of a Flux model wrt to the weights.
I first want to show what I mean (or want to achieve).

Suppose I have a very simple linear model such that

y(x)=W \cdot x + b .

Now, using this model I wish to obtain the following

\frac{\partial y}{\partial W} = x

and

\frac{\partial y}{\partial b} = 1

Further, in the case of a nonlinear model, given a nonlinear activation function \sigma(x), I have the nonlinear model

f(x)=\sigma \left( W \cdot x + b \right)

and I wish to obtain the following

\frac{\partial y}{\partial W} = \frac{\partial y}{\partial \sigma} \frac{\partial \sigma}{\partial W} = x \sigma'(W \cdot x + b)

and

\frac{\partial y}{\partial b} = \frac{\partial y}{\partial \sigma} \frac{\partial \sigma}{\partial b} = \sigma'(W \cdot x + b)

I hope I haven’t made a mistake somewhere in my computations. Please, do correct me if there is a mistake.

Anyway, both linear and nonlinear models can be readily implemented using the Dense layer from Flux.
What I currently have is the following MWE

using Flux
using Random

Random.seed!(8129)

# Create a very simple model
# Note that this is a *linear* model !!
model = Flux.Dense(3, 1)
baseline = Flux.params(model)
display(baseline)

# Compute the gradient wrt to the weights
# We should be able to obtain the same parameters as before!
some_input = rand(3, 3)
some_output = model(some_input)
display(some_output)
grad = Flux.gradient(x -> model(x), some_input) # Gradient evaluated at the inputs
display(grad)

but the output is just an error

ERROR: LoadError: Output should be scalar; gradients are not defined for output Float32[0.28850588 0.8061751 0.3004352]

How can I obtain the derivatives I’m looking for?

ToucheSir · May 19, 2021, 4:13am

gradient expects the callback to return a scalar output representing the loss to be backpropogated. I’d recommend having a quick read through Basics · Flux to see how it works in practice. In short, both your model’s forward pass and the calculation of the loss function need to happen inside the callback in order to get gradients for the model parameters.

edwinb-ai · May 19, 2021, 4:12pm

Thank you for your answer.
I currently do not plan on setting up or having a loss function, I just need the gradients for now,
but you point out a very important aspect, which is that the gradient function expects a scalar output.

marius311 · May 19, 2021, 8:34pm

On newer versions of Zygote/Flux, you would have gotten the error message:

julia> grad = Flux.gradient(x -> model(x), some_input) # Gradient evaluated at the inputs
ERROR: output an array, so the gradient is not defined. Perhaps you wanted jacobian.

and indeed Flux.jacobian(x -> model(x), some_input) works. The terminology is generally that the derivative of a scalar function (like a loss) is the “gradient” and the derivative of a vector function is the “jacobian,” so you just need the latter (and to upgrade to the latest version). Of course, if in the end you do have a loss function, you’ll just want to do a gradient rather than explicitly calculating the intermediate jacobian.

edwinb-ai · May 19, 2021, 10:18pm

Thank for your answer. It actually works, it returns the expected results.

I found an alternative that I came across while reading the Zygote docs, which looks something like the following

grad = Flux.gradient(() -> sum(model(some_input)), Flux.params(model))

With the sum I’m able to provide the scalar output that the gradient is expecting (following what @ToucheSir mentioned), and also it returns the expected results.

Thank you both for your input and answers.

Topic		Replies	Views
Problem on model and gradient descend in Flux General Usage	18	181	October 27, 2024
How to obtain the gradients of intermediate variables with Flux Machine Learning question , flux	11	1312	March 24, 2022
Flux loss: Gradient wrt input leads to empty gradient wrt parameters or to "can't differentiate foreigncall" Machine Learning flux , forwarddiff , diffeqflux	3	551	April 8, 2022
ERROR: Output should be scalar; gradients are not defined for output General Usage question	0	448	January 3, 2021
Simple Flux model not learning Machine Learning flux	4	1077	October 21, 2019

Gradient of Flux model wrt to weights

Related topics