Simple Flux model not learning

I’ve been trying to get a feel for Flux by modeling a simple non-linear function (X^2).

At the end of each epoch, I print the model output for 3 (expecting 9), but it seems to converge to ~34

Can anyone see what I’m doing wrong?

using Flux
using Printf

model = Chain(
          Dense(1, 50),
          Dense(50, 1))

x = collect(-10:.1:10)'
y = x.^2
N = length(y)

loss(x, y) = Flux.mse(model(x), y)

opt = ADAM()
epochs = 15

ps = Flux.params(model)

@progress for epoch = 1:epochs  
  for i = 1:N
    gs = Flux.Tracker.gradient(() -> loss(x[:,i], y[i]), ps)
    Flux.Tracker.update!(opt, Tracker.Params(ps), gs) 
  end
  @printf "Epoch: %d  3^2 = %1.2f\n" epoch model([3]).data[1]
end

Isn’t x a 1-dimensional vector?

EDIT: Nevermind, missed the transpose.

I think the main thing you’re missing is a non-linear activation function. You can’t replicate the square of a number by simply taking a linear combination of itself.

By using relu, I got decent convergence after 100 epochs with your example.

I also tried using batch gradients, but convergence was very slow… about 10,000 epochs. There’s probably some tuning required to make it faster.

EDIT: Yep, ADAM(0.1) with batch gradients gets very good convergence by 1000 epochs and runs far faster than iterating through each data point.

Like already stated you’ll need a non linear activation like a relu. Additionally I expect that the range of your input data is too large [-10,10] for training and too correlated between successive samples. I would suggest something along the lines of:

x = randn(1, nr_of_samples_you_want)

Hope this improves things a bit.

Ahh. I thought the default activation function was sigmoid. I see now that it’s an identity function.

(and using x = randn(1, nr_of_samples_you_want) helped also)

Thank you both.