I’m trying to reproduce the results from this Tensorflow tutorial:
The objective is a nonlinear regression. Sounds simple enough, but I’m struggling to reproduce the results from the web page. My attempt looks like this:
using Flux
using StatsPlots
using IterTools: ncycle
xvals = collect(Float32, range(-10.0, 10.0, length=1000))
xvals = reshape(xvals, (1, 1000))
yvals = 0.1f0.*xvals'.*cos.(xvals') + 0.1f0*randn(Float32, length(xvals))
scatter(xvals', yvals, ms=0.1, linewidth=0, markerstrokewidth=0, legend=nothing)
approx = Chain(
Dense(1, 64),
Dense(64, 64, relu),
Dense(64, 64, relu),
Dense(64, 1)
)
loss(x, y) = Flux.Losses.mse(x', y)
trainer = Flux.Data.DataLoader((xvals, yvals[:, 1]), shuffle=true)
Flux.train!(loss, params(approx), ncycle(trainer, 100), ADAM())
scatter(xvals', yvals, ms=0.1, linewidth=0, markerstrokewidth=0, legend=nothing)
plot!(xvals', approx(xvals)')
I’m using the same layers, activation functions and optimizer as the example (at least I think I am).
Unfortunately, the result isn’t even close. In 100 epochs, the example fits the shape reasonably well, but the code I posted here basically just gives two straight lines, one for negative x-values and one for positive x-values.
I could always try different layer structures, and different optimizers, but I feel like I’m missing something obvious, because this is a simple example that I should be able to reproduce. It works in Tensorflow, so there is no reason it shouldn’t work in Flux. Thank you for any suggestions on how to get closer to the Tensorflow performance.
Edit: I’ve also looked at the following posts, which contain useful information for how to solve the problem in general, but I couldn’t find an explanation for why the performance seems to be so different from that posted in the webpage.