Problem with first example with Flux


I am trying the very first example in flux from Overview · Flux. The code is given below. It works fine when

hcat(0:5...) but doesn’t when hcat(0:6...). Why?

using Flux

using Flux: train!

import Random


actual(x) = 4x + 2
loss(x, y) = Flux.Losses.mse(predict(x), y)
x_train = hcat(0:5...) 
x_test = hcat(7:14...)
y_train = actual.(x_train)
y_test = actual.(x_test)
print(y_train, "\n", y_test,"\n")
predict = Dense(1 => 1)
opt = Descent()
data = [(x_train, y_train)]
parameters = Flux.params(predict)
predict.weight in parameters, predict.bias in parameters
for epoch in 1:200
train!(loss, parameters, data, opt)
print(loss(x_train, y_train), "\n")

Output with 5

[2 6 10 14 18 22]
[30 34 38 42 46 50 54 58]
Params([Float32[3.9697118;;], Float32[1.9914621]])

Output with 6

[2 6 10 14 18 22 26]
[30 34 38 42 46 50 54 58]
Params([Float32[NaN;;], Float32[NaN]])

Try adding the line @show parameters in your training loop and you’ll see how the parameters oscillates with larger and larger steps. One way to get around this is to take smaller steps in your optmizer, for Decent the default value is 0.1 so I would suggest trying with something smaller than that.

The reason why it happens in this specific case is that the magnitude of the error is
which at some point will just get larger with larger x, also creating gradients that become increasingly large with larger x. Therefore it is expected for this model that if you take a range of x=0:n there will be some n after which you get gradient steps large enough to cause unstable updates that just grow to infinity. Lowering the optimizer step size will allow for larger n, though you will still hit a new higher limit at some point.