Flux function fitting

I’m trying to reproduce the results from this Tensorflow tutorial:

The objective is a nonlinear regression. Sounds simple enough, but I’m struggling to reproduce the results from the web page. My attempt looks like this:

using Flux
using StatsPlots
using IterTools: ncycle
xvals = collect(Float32, range(-10.0, 10.0, length=1000))
xvals = reshape(xvals, (1, 1000))
yvals = 0.1f0.*xvals'.*cos.(xvals') + 0.1f0*randn(Float32, length(xvals))
scatter(xvals', yvals, ms=0.1, linewidth=0, markerstrokewidth=0, legend=nothing)
approx = Chain(
    Dense(1, 64),
    Dense(64, 64, relu),
    Dense(64, 64, relu),
    Dense(64, 1)
loss(x, y) = Flux.Losses.mse(x', y)
trainer = Flux.Data.DataLoader((xvals, yvals[:, 1]), shuffle=true)
Flux.train!(loss, params(approx), ncycle(trainer, 100), ADAM())
scatter(xvals', yvals, ms=0.1, linewidth=0, markerstrokewidth=0, legend=nothing)
plot!(xvals', approx(xvals)')

I’m using the same layers, activation functions and optimizer as the example (at least I think I am).
Unfortunately, the result isn’t even close. In 100 epochs, the example fits the shape reasonably well, but the code I posted here basically just gives two straight lines, one for negative x-values and one for positive x-values.
I could always try different layer structures, and different optimizers, but I feel like I’m missing something obvious, because this is a simple example that I should be able to reproduce. It works in Tensorflow, so there is no reason it shouldn’t work in Flux. Thank you for any suggestions on how to get closer to the Tensorflow performance.

Edit: I’ve also looked at the following posts, which contain useful information for how to solve the problem in general, but I couldn’t find an explanation for why the performance seems to be so different from that posted in the webpage.

One important typo, your loss function is evaluating x vs y with no intervening model, leading to the nice line you get. Try

loss(x, y) = Flux.Losses.mse(approx(x)', y)

And two less important differences.

  1. Keras uses a batch size of 32 by default, Flux.DataLoader does not batch unless you ask.
trainer = Flux.Data.DataLoader((xvals, yvals[:, 1]), shuffle=true, batchsize=32)
  1. Keras uses 0.01 as the default initial learning rate for ADAM, Flux uses 0.001 default initial learning rate
Flux.train!(loss, params(approx), ncycle(trainer, 100), ADAM(0.01))

After these fixes, I see an acceptable fit.


Urgh. I knew I was missing something obvious. Of course Flux can only train the network if the loss function actually depends on the network parameters…
Thank you very much for taking a look and helping out!