Approximating a Quadratic Function with Flux

I’m trying to get familiar with neural networks and Flux by estimating a series of simple models. First, I can successfully estimate a linear model using Flux:

using Plots
using Flux
using Flux: @epochs

gridsize = 100;
dgp(x) = -12x+3;
X = collect(range(0,stop=10,length=gridsize));
Y = dgp.(X);

data = []
for i in 1:length(X)
    push!(data, ([X[i]], Y[i]))
end

model = Chain(Dense(1,1))
loss(x, y) = Flux.mse(model(x), y)
opt = Descent(0.01)
ps = Flux.params(model)
@epochs 10 Flux.train!(loss, ps, data, opt)

# Plot.
plot(X,[Y model(X').data'],label=["DGP" "Model"])

With only a few iterations, the model does a pretty good job:

21%20AM

I’m running into problems trying to approximate a quadratic function. The code is largely the same:

using Plots
using Flux
using Flux: @epochs

gridsize = 100;
dgp(x) = x^2;
X = collect(range(0,stop=10,length=gridsize));
Y = dgp.(X);

data = []
for i in 1:length(X)
    push!(data, ([X[i]], Y[i]))
end

Q = 10;
model = Chain(Dense(1,Q,σ),
    Dense(Q,1,identity));

loss(x, y) = Flux.mse(model(x), y)
opt = Descent(0.01)
para = Flux.params(model)
@epochs 10 Flux.train!(loss, para, data, opt)

# Plot.
plot(X,[Y model(X').data'],label=["DGP" "Model"])

Theoretically, I should be able to represent the function f(x) = x^2 over my compact grid, and 10 hidden layers (i.e., Q = 10 in my code) should be sufficient for a fairly good approximation. Running this code, however, generates a very “flat” model:

07%20AM

I’ve tried different activation functions, changing the speed of the gradient descent, and a few other things, but I’m wondering if I’m doing something wrong within Flux. Thanks in advance for any help!

1 Like

If I’m reading the Flux docs correctly, the Dense function’s second argument is just the number of outputs:

Flux.Dense — Type.
Dense(in::Integer, out::Integer, σ = identity)
Creates a traditional Dense layer with parameters W and b.

y = σ.(W * x .+ b)

So your

model = Chain(Dense(1,Q,σ),
    Dense(Q,1,identity));

doesn’t have Q hidden layers, it’s just two layers?

1 Like

Sorry, that’s a typo. I actually have one hidden layer with 10 nodes; the second “layer” just linearly combines the nodes. This should should be enough for a good approximation to a simple quadratic function. I’d also tried increasing the number of hidden layers, but that doesn’t help with the “flatness,” either.

The following small changes lead to a pretty good fit:

Q = 10;
model = Chain(Dense(1,Q,tanh),
    Dense(Q,1,identity));

loss(x, y) = Flux.mse(model(x), y)
opt = ADAM(.001)
para = Flux.params(model)
@epochs 500 Flux.train!(loss, para, data, opt)


1 Like

So this is not related to Flux but to the activation function.

You also can try putting more neurons, 10 is small even for a simple quadratic unction… Those days it is better to build large/deep networks than shallow ones.

I will soon put an example here of some tests I did, showing than a 3 layers - 100 neurons per layer network performs better at predicting a sigmoid function than a simple 1 layer - 20 neurons network, despite being trained on 40 points only…

2 Likes

Thank you all for the helpful replies. This answer helped a lot! In fact, if I leave everything in my original code the same, but decrease the learning rate, it also works very well.

In relation to my message, here is finally a first version of the notebook!! We see that a moderately deep network performs better than the shallow one…

2 Likes