Approximating a Quadratic Function with Flux

I’m trying to get familiar with neural networks and Flux by estimating a series of simple models. First, I can successfully estimate a linear model using Flux:

using Plots
using Flux
using Flux: @epochs

gridsize = 100;
dgp(x) = -12x+3;
X = collect(range(0,stop=10,length=gridsize));
Y = dgp.(X);

data = []
for i in 1:length(X)
    push!(data, ([X[i]], Y[i]))
end

model = Chain(Dense(1,1))
loss(x, y) = Flux.mse(model(x), y)
opt = Descent(0.01)
ps = Flux.params(model)
@epochs 10 Flux.train!(loss, ps, data, opt)

# Plot.
plot(X,[Y model(X').data'],label=["DGP" "Model"])

With only a few iterations, the model does a pretty good job:

21%20AM

I’m running into problems trying to approximate a quadratic function. The code is largely the same:

using Plots
using Flux
using Flux: @epochs

gridsize = 100;
dgp(x) = x^2;
X = collect(range(0,stop=10,length=gridsize));
Y = dgp.(X);

data = []
for i in 1:length(X)
    push!(data, ([X[i]], Y[i]))
end

Q = 10;
model = Chain(Dense(1,Q,σ),
    Dense(Q,1,identity));

loss(x, y) = Flux.mse(model(x), y)
opt = Descent(0.01)
para = Flux.params(model)
@epochs 10 Flux.train!(loss, para, data, opt)

# Plot.
plot(X,[Y model(X').data'],label=["DGP" "Model"])

Theoretically, I should be able to represent the function f(x) = x^2 over my compact grid, and 10 hidden layers (i.e., Q = 10 in my code) should be sufficient for a fairly good approximation. Running this code, however, generates a very “flat” model:

07%20AM

I’ve tried different activation functions, changing the speed of the gradient descent, and a few other things, but I’m wondering if I’m doing something wrong within Flux. Thanks in advance for any help!

If I’m reading the Flux docs correctly, the Dense function’s second argument is just the number of outputs:

Flux.Dense — Type.
Dense(in::Integer, out::Integer, σ = identity)
Creates a traditional Dense layer with parameters W and b.

y = σ.(W * x .+ b)

So your

model = Chain(Dense(1,Q,σ),
    Dense(Q,1,identity));

doesn’t have Q hidden layers, it’s just two layers?

Sorry, that’s a typo. I actually have one hidden layer with 10 nodes; the second “layer” just linearly combines the nodes. This should should be enough for a good approximation to a simple quadratic function. I’d also tried increasing the number of hidden layers, but that doesn’t help with the “flatness,” either.

The following small changes lead to a pretty good fit:

Q = 10;
model = Chain(Dense(1,Q,tanh),
    Dense(Q,1,identity));

loss(x, y) = Flux.mse(model(x), y)
opt = ADAM(.001)
para = Flux.params(model)
@epochs 500 Flux.train!(loss, para, data, opt)


So this is not related to Flux but to the activation function.

You also can try putting more neurons, 10 is small even for a simple quadratic unction… Those days it is better to build large/deep networks than shallow ones.

I will soon put an example here of some tests I did, showing than a 3 layers - 100 neurons per layer network performs better at predicting a sigmoid function than a simple 1 layer - 20 neurons network, despite being trained on 40 points only…

Thank you all for the helpful replies. This answer helped a lot! In fact, if I leave everything in my original code the same, but decrease the learning rate, it also works very well.

In relation to my message, here is finally a first version of the notebook!! We see that a moderately deep network performs better than the shallow one…