Flux confused me for a long time. I got the first example in the docs to work:
using Flux.Tracker W = param(rand(2, 5)) b = param(rand(2)) predict(x) = W*x .+ b loss(x, y) = sum((predict(x) .- y).^2) x, y = rand(5), rand(2) # Dummy data l = loss(x, y) # ~ 3 params = Params([W, b]) grads = Tracker.gradient(() -> loss(x, y), params)
But other examples use a
params function. This seems to be required to use optimizers like SGD and ADAM.
I spent a few hours trying to use ADAM to optimize a simple example, with no luck. Flux’s optimization seems to require a special “layer”, which is not just a function, but requires some kind of wrapper.
Is there a way to “lift” an arbitrary function so Flux will treat it as a layer?