Hello everyone.
I’m attempting to obtain the gradient of a Flux model wrt to the weights.
I first want to show what I mean (or want to achieve).
Suppose I have a very simple linear model such that
Now, using this model I wish to obtain the following
and
Further, in the case of a nonlinear model, given a nonlinear activation function \sigma(x), I have the nonlinear model
and I wish to obtain the following
and
I hope I haven’t made a mistake somewhere in my computations. Please, do correct me if there is a mistake.
Anyway, both linear and nonlinear models can be readily implemented using the Dense
layer from Flux.
What I currently have is the following MWE
using Flux
using Random
Random.seed!(8129)
# Create a very simple model
# Note that this is a *linear* model !!
model = Flux.Dense(3, 1)
baseline = Flux.params(model)
display(baseline)
# Compute the gradient wrt to the weights
# We should be able to obtain the same parameters as before!
some_input = rand(3, 3)
some_output = model(some_input)
display(some_output)
grad = Flux.gradient(x -> model(x), some_input) # Gradient evaluated at the inputs
display(grad)
but the output is just an error
ERROR: LoadError: Output should be scalar; gradients are not defined for output Float32[0.28850588 0.8061751 0.3004352]
How can I obtain the derivatives I’m looking for?