One important typo, your loss function is evaluating x vs y with no intervening model, leading to the nice line you get. Try
loss(x, y) = Flux.Losses.mse(approx(x)', y)
And two less important differences.
- Keras uses a batch size of 32 by default,
Flux.DataLoader
does not batch unless you ask.
trainer = Flux.Data.DataLoader((xvals, yvals[:, 1]), shuffle=true, batchsize=32)
- Keras uses
0.01
as the default initial learning rate for ADAM, Flux uses0.001
default initial learning rate
Flux.train!(loss, params(approx), ncycle(trainer, 100), ADAM(0.01))
After these fixes, I see an acceptable fit.