Flux function fitting

jstrube · August 7, 2020, 4:45am

I’m trying to reproduce the results from this Tensorflow tutorial:

The objective is a nonlinear regression. Sounds simple enough, but I’m struggling to reproduce the results from the web page. My attempt looks like this:

using Flux
using StatsPlots
using IterTools: ncycle
xvals = collect(Float32, range(-10.0, 10.0, length=1000))
xvals = reshape(xvals, (1, 1000))
yvals = 0.1f0.*xvals'.*cos.(xvals') + 0.1f0*randn(Float32, length(xvals))
scatter(xvals', yvals, ms=0.1, linewidth=0, markerstrokewidth=0, legend=nothing)
approx = Chain(
    Dense(1, 64),
    Dense(64, 64, relu),
    Dense(64, 64, relu),
    Dense(64, 1)
)
loss(x, y) = Flux.Losses.mse(x', y)
trainer = Flux.Data.DataLoader((xvals, yvals[:, 1]), shuffle=true)
Flux.train!(loss, params(approx), ncycle(trainer, 100), ADAM())
scatter(xvals', yvals, ms=0.1, linewidth=0, markerstrokewidth=0, legend=nothing)
plot!(xvals', approx(xvals)')

I’m using the same layers, activation functions and optimizer as the example (at least I think I am).
Unfortunately, the result isn’t even close. In 100 epochs, the example fits the shape reasonably well, but the code I posted here basically just gives two straight lines, one for negative x-values and one for positive x-values.
I could always try different layer structures, and different optimizers, but I feel like I’m missing something obvious, because this is a simple example that I should be able to reproduce. It works in Tensorflow, so there is no reason it shouldn’t work in Flux. Thank you for any suggestions on how to get closer to the Tensorflow performance.

Edit: I’ve also looked at the following posts, which contain useful information for how to solve the problem in general, but I couldn’t find an explanation for why the performance seems to be so different from that posted in the webpage.

contradict · August 7, 2020, 6:02am

One important typo, your loss function is evaluating x vs y with no intervening model, leading to the nice line you get. Try

loss(x, y) = Flux.Losses.mse(approx(x)', y)

And two less important differences.

Keras uses a batch size of 32 by default, Flux.DataLoader does not batch unless you ask.

trainer = Flux.Data.DataLoader((xvals, yvals[:, 1]), shuffle=true, batchsize=32)

Keras uses 0.01 as the default initial learning rate for ADAM, Flux uses 0.001 default initial learning rate

Flux.train!(loss, params(approx), ncycle(trainer, 100), ADAM(0.01))

After these fixes, I see an acceptable fit.

jstrube · August 7, 2020, 1:24pm

Urgh. I knew I was missing something obvious. Of course Flux can only train the network if the loss function actually depends on the network parameters…
Thank you very much for taking a look and helping out!

Topic		Replies	Views
Nonlinear fit with Flux Machine Learning flux	2	955	January 10, 2021
Flux results not similar to Tensorflow Machine Learning question	3	1817	March 11, 2019
No changes with Flux NN regression training Machine Learning question	2	659	October 2, 2020
Generic Function to train NN w/ Flux Machine Learning flux	7	1648	April 14, 2020
Problems with Flux Machine Learning	2	1596	March 14, 2018

Flux function fitting

Related topics