Ok, so the docs have this example:
opt = SGD([W, b], 0.1) # Gradient descent with learning rate 0.1
opt() # Carry out the update, modifying `W` and `b`.
So let’s try it for the case above:
using Flux
using Flux.Tracker
using Flux.Optimise
W = param(rand(2, 5))
b = param(rand(2))
predict(x) = W*x .+ b
loss(x, y) = sum((predict(x) .- y).^2)
x, y = rand(5), rand(2) # Dummy data
par = Params([W, b])
grads = Tracker.gradient(() -> loss(x, y), par)
opt = SGD(par, 0.1)
opt()
This returns
MethodError: no method matching length(::Params)
Closest candidates are:
length(!Matched::Core.SimpleVector) at essentials.jl:576
length(!Matched::Base.MethodList) at reflection.jl:728
length(!Matched::Core.MethodTable) at reflection.jl:802
...
But it doesn’t make sense, anyway - the call uses parameters, but has no reference to the loss function.
It seems that for proper Flux model, [W,b]
or params(m)
must store the loss function under the hood. But it’s not clear how to connect this for a hand-rolled loss function.