I tried to replicate a simple regression model from Flux’s documentation with a slightly larger dataset. But the gradient descent seems to always diverge. A manual implementation of the algorithm works well.

I would appreciate if anyone can point the problem.

Here is my code using Flux:

```
using Flux
using Flux.Tracker
using Flux.Tracker: update!
using RDatasets
trees = dataset("datasets", "trees");
X = Matrix(trees[!, [:Girth,:Height]])
y = trees[!, :Volume]
n = length(y)
nfeatures = size(X, 2)
W = rand(nfeatures)
b = rand()
predict(X, W, b) = X*W .+ b
function loss(X, y, W, b)
ŷ = predict(X, W, b)
sum((y .- ŷ).^2)
end
loss(X, y, W, b)
# ~ 27139.27213905282
W = param(W)
b = param(b)
gs = Tracker.gradient(() -> loss(X, y, W, b), params(W, b))
# Update the parameter and reset the gradient
update!(W, -0.0003gs[W])
update!(b, -0.0003gs[b])
loss(X, y, W, b)
# ~ 2.6542565871928763e8 (tracked)
```