Problem with first example with Flux

Hello,

I am trying the very first example in flux from Overview · Flux. The code is given below. It works fine when

hcat(0:5...) but doesn’t when hcat(0:6...). Why?

using Flux

using Flux: train!

import Random

Random.seed!(1)

actual(x) = 4x + 2
loss(x, y) = Flux.Losses.mse(predict(x), y)
x_train = hcat(0:5...) 
x_test = hcat(7:14...)
y_train = actual.(x_train)
y_test = actual.(x_test)
print(y_train, "\n", y_test,"\n")
predict = Dense(1 => 1)
opt = Descent()
data = [(x_train, y_train)]
parameters = Flux.params(predict)
predict.weight in parameters, predict.bias in parameters
for epoch in 1:200
train!(loss, parameters, data, opt)
end
print(loss(x_train, y_train), "\n")
parameters

Output with 5

[2 6 10 14 18 22]
[30 34 38 42 46 50 54 58]
0.009775136
Params([Float32[3.9697118;;], Float32[1.9914621]])

Output with 6

[2 6 10 14 18 22 26]
[30 34 38 42 46 50 54 58]
NaN
Params([Float32[NaN;;], Float32[NaN]])

Try adding the line @show parameters in your training loop and you’ll see how the parameters oscillates with larger and larger steps. One way to get around this is to take smaller steps in your optmizer, for Decent the default value is 0.1 so I would suggest trying with something smaller than that.

The reason why it happens in this specific case is that the magnitude of the error is
|wx+b-4x-2|=|(w-4)x+b-2|
which at some point will just get larger with larger x, also creating gradients that become increasingly large with larger x. Therefore it is expected for this model that if you take a range of x=0:n there will be some n after which you get gradient steps large enough to cause unstable updates that just grow to infinity. Lowering the optimizer step size will allow for larger n, though you will still hit a new higher limit at some point.

2 Likes