Gradient in simple regression not working

Ali_Vahdati · August 22, 2019, 8:55am

I tried to replicate a simple regression model from Flux’s documentation with a slightly larger dataset. But the gradient descent seems to always diverge. A manual implementation of the algorithm works well.

I would appreciate if anyone can point the problem.

Here is my code using Flux:


using Flux
using Flux.Tracker
using Flux.Tracker: update!
using RDatasets

trees = dataset("datasets", "trees");

X = Matrix(trees[!, [:Girth,:Height]])
y = trees[!, :Volume]
n = length(y)
nfeatures = size(X, 2)

W = rand(nfeatures)
b = rand()

predict(X, W, b) = X*W .+ b

function loss(X, y, W, b)
  ŷ = predict(X, W, b)
  sum((y .- ŷ).^2)
end

loss(X, y, W, b)
# ~ 27139.27213905282

W = param(W)
b = param(b)
gs = Tracker.gradient(() -> loss(X, y, W, b), params(W, b))
# Update the parameter and reset the gradient
update!(W, -0.0003gs[W])
update!(b, -0.0003gs[b])
loss(X, y, W, b)
# ~ 2.6542565871928763e8 (tracked)

baggepinnen · August 22, 2019, 9:27am

Are you sure your stepsize is good and have the correct sign?

Ali_Vahdati · August 22, 2019, 9:30am

It has the correct sign, following the example in the docs. This is the size I use in a manual implementation of the model and it works. Nevertheless, I have checked an order of magnitude smaller step size and it still diverges.

MikeInnes · September 2, 2019, 2:52pm

The best way forward is probably to try and gradually make the flux implementation look more like the manual version, or vice versa, until you figure out what the difference is between the two. You’re already calculating gradients directly, so perhaps check that lines up with your other version?

Ali_Vahdati · September 3, 2019, 10:40am

Thanks for your reply.
I changed the loss function to the following:

function loss(X, y, W, b)
  ŷ = predict(X, W, b)
  sum((y .- ŷ).^2) / (2*length(y))
end

And the gradients are correct now. However, I am not sure why taking the average of sums of squares is important for the gradients to work.

Topic		Replies	Views
Take gradient of parameters not working Machine Learning question , flux	0	344	January 12, 2021
Simple Flux model not learning Machine Learning flux	4	1077	October 21, 2019
Gradient not being computed when training NN using Flux Machine Learning question , flux , sciml	3	101	January 30, 2025
Problems with Flux NN regression Machine Learning question , package	1	408	November 19, 2021
Problems with Flux Machine Learning	2	1595	March 14, 2018

Gradient in simple regression not working

Related topics