I’ve been trying to get a feel for Flux by modeling a simple non-linear function (X^2).
At the end of each epoch, I print the model output for 3 (expecting 9), but it seems to converge to ~34
Can anyone see what I’m doing wrong?
model = Chain(
x = collect(-10:.1:10)'
y = x.^2
N = length(y)
loss(x, y) = Flux.mse(model(x), y)
opt = ADAM()
epochs = 15
ps = Flux.params(model)
@progress for epoch = 1:epochs
for i = 1:N
gs = Flux.Tracker.gradient(() -> loss(x[:,i], y[i]), ps)
Flux.Tracker.update!(opt, Tracker.Params(ps), gs)
@printf "Epoch: %d 3^2 = %1.2f\n" epoch model().data
x a 1-dimensional vector?
EDIT: Nevermind, missed the transpose.
I think the main thing you’re missing is a non-linear activation function. You can’t replicate the square of a number by simply taking a linear combination of itself.
relu, I got decent convergence after 100 epochs with your example.
I also tried using batch gradients, but convergence was very slow… about 10,000 epochs. There’s probably some tuning required to make it faster.
ADAM(0.1) with batch gradients gets very good convergence by 1000 epochs and runs far faster than iterating through each data point.
Like already stated you’ll need a non linear activation like a relu. Additionally I expect that the range of your input data is too large [-10,10] for training and too correlated between successive samples. I would suggest something along the lines of:
x = randn(1, nr_of_samples_you_want)
Hope this improves things a bit.
Ahh. I thought the default activation function was sigmoid. I see now that it’s an identity function.
(and using x = randn(1, nr_of_samples_you_want) helped also)
Thank you both.