Gradient of Flux model wrt to weights

Hello everyone.

I’m attempting to obtain the gradient of a Flux model wrt to the weights.
I first want to show what I mean (or want to achieve).

Suppose I have a very simple linear model such that

y(x)=W \cdot x + b .

Now, using this model I wish to obtain the following

\frac{\partial y}{\partial W} = x

and

\frac{\partial y}{\partial b} = 1

Further, in the case of a nonlinear model, given a nonlinear activation function \sigma(x), I have the nonlinear model

f(x)=\sigma \left( W \cdot x + b \right)

and I wish to obtain the following

\frac{\partial y}{\partial W} = \frac{\partial y}{\partial \sigma} \frac{\partial \sigma}{\partial W} = x \sigma'(W \cdot x + b)

and

\frac{\partial y}{\partial b} = \frac{\partial y}{\partial \sigma} \frac{\partial \sigma}{\partial b} = \sigma'(W \cdot x + b)

I hope I haven’t made a mistake somewhere in my computations. Please, do correct me if there is a mistake.

Anyway, both linear and nonlinear models can be readily implemented using the Dense layer from Flux.
What I currently have is the following MWE

using Flux
using Random

Random.seed!(8129)

# Create a very simple model
# Note that this is a *linear* model !!
model = Flux.Dense(3, 1)
baseline = Flux.params(model)
display(baseline)

# Compute the gradient wrt to the weights
# We should be able to obtain the same parameters as before!
some_input = rand(3, 3)
some_output = model(some_input)
display(some_output)
grad = Flux.gradient(x -> model(x), some_input) # Gradient evaluated at the inputs
display(grad)

but the output is just an error

ERROR: LoadError: Output should be scalar; gradients are not defined for output Float32[0.28850588 0.8061751 0.3004352]

How can I obtain the derivatives I’m looking for?

gradient expects the callback to return a scalar output representing the loss to be backpropogated. I’d recommend having a quick read through Basics · Flux to see how it works in practice. In short, both your model’s forward pass and the calculation of the loss function need to happen inside the callback in order to get gradients for the model parameters.

Thank you for your answer.
I currently do not plan on setting up or having a loss function, I just need the gradients for now,
but you point out a very important aspect, which is that the gradient function expects a scalar output.

On newer versions of Zygote/Flux, you would have gotten the error message:

julia> grad = Flux.gradient(x -> model(x), some_input) # Gradient evaluated at the inputs
ERROR: output an array, so the gradient is not defined. Perhaps you wanted jacobian.

and indeed Flux.jacobian(x -> model(x), some_input) works. The terminology is generally that the derivative of a scalar function (like a loss) is the “gradient” and the derivative of a vector function is the “jacobian,” so you just need the latter (and to upgrade to the latest version). Of course, if in the end you do have a loss function, you’ll just want to do a gradient rather than explicitly calculating the intermediate jacobian.

1 Like

Thank for your answer. It actually works, it returns the expected results.

I found an alternative that I came across while reading the Zygote docs, which looks something like the following

grad = Flux.gradient(() -> sum(model(some_input)), Flux.params(model))

With the sum I’m able to provide the scalar output that the gradient is expecting (following what @ToucheSir mentioned), and also it returns the expected results.

Thank you both for your input and answers.