Flux confused me for a long time. I got the first example in the docs to work:

```
using Flux.Tracker
W = param(rand(2, 5))
b = param(rand(2))
predict(x) = W*x .+ b
loss(x, y) = sum((predict(x) .- y).^2)
x, y = rand(5), rand(2) # Dummy data
l = loss(x, y) # ~ 3
params = Params([W, b])
grads = Tracker.gradient(() -> loss(x, y), params)
```

But other examples use a `params`

function. This seems to be required to use optimizers like SGD and ADAM.

I spent a few hours trying to use ADAM to optimize a simple example, with no luck. Fluxâ€™s optimization seems to require a special â€ślayerâ€ť, which is not just a function, but requires some kind of wrapper.

Is there a way to â€śliftâ€ť an arbitrary function so Flux will treat it as a layer?