3 Layered Neural Network for a simple curve fit not working

I am currently attempting to utilize a three-layered neural network to model the function cos(x). Despite extensive efforts to debug my code, I have been unable to identify and rectify the source of error. It is my belief that there is a substantial mistake in the calculation of gradients, which is hindering the accuracy of the model. I would be greatly appreciative of your assistance in resolving this issue. Thank you for your time and consideration.

using LinearAlgebra, Statistics, Plots


# Define the activation functions and their derivatives
function  mysigmoid(x)
    1 ./ (1 .+ exp(-x));
end

function mysigmoid_derivative(x)
    sig = mysigmoid(x);
    sig .* (1 .- sig);
end

# Define the loss function
function loss_fn(y, y_pred)
    mean((y_pred-y).^2);
end

function  loss_derivative(y, y_pred)
2 .* (y_pred - y) ./ length(y);
end


# Define the neural network model
function networkmine(h0, w1, b1, w2, b2, w3, b3)
 h0=x;

    z1=   w1.*h0 .+ b1;
    h1 = mysigmoid.(z1);

    z2 =  w2.*h1 .+ b2;
    h2 = mysigmoid.(z2);

    z3 =   w3.*h2 .+ b3;
    h3 = mysigmoid.(z3);

return h3,h2, h1,z3, z2, z1
end





# Train the neural network
n=50;
x = LinRange(0, pi, n);
y =(sin.(x));

# Initialize the weights and biases
w1 = 0.01*randn(n, 1);
b1 = zeros(n, 1);
w2 =0.01* randn(n, 1);
b2 = zeros(n, 1);
w3 =0.01* randn(n, 1);
b3 = zeros(n, 1);

learning_rate = 0.01;
for i in 1:1000
    # Compute the predicted output
    global w1, b1, w2, b2, w3, b3,y_pred
    y_pred,h2, h1,z3, z2, z1 = networkmine(x, w1, b1, w2, b2, w3, b3);

    # Compute the error
    error = loss_fn(y, y_pred);

    # Compute the gradients
    d_y_pred = 2 * (y_pred - y) ./ length(y);
    d_w3 = h2 .* (d_y_pred .* mysigmoid_derivative.(z3));
    d_b3 = (d_y_pred .* mysigmoid_derivative.(z3));

    d_h2 = (w3 .* d_y_pred) .* mysigmoid_derivative.(z3);
    d_w2 = h1 .* (d_h2 ).* mysigmoid_derivative.(z2);
    d_b2 = (d_h2 ).* mysigmoid_derivative.(z2);

    d_h1 = (w2 .* d_h2) .* mysigmoid_derivative.(z2);
    d_w1 = x .* (d_h1 ).* mysigmoid_derivative.(z1);
    d_b1 = (d_h1 ).* mysigmoid_derivative.(z1);

    # Update the weights and biases
    w3 = w3 .- learning_rate * d_w3;
    b3 = b3 .- learning_rate * d_b3;
    w2 = w2 .- learning_rate * d_w2;
    b2 = b2 .- learning_rate * d_b2;
    w1 = w1 .- learning_rate * d_w1;
    b1 = b1 .- learning_rate * d_b1;
end

# Plot the results
plot(x, y)
plot!(x, y_pred)

I think your code works – you just need to run it for more iterations. With learning_rate = 0.1 and 10000 iterations, I get this result:

Although it looks like it’s fitting a sine, not a cosine :wink:

Some general tips that might save you some headaches down the road:

  • You can define the “elementary” functions like mysigmoid without the dots – that makes it more clear what the method can do (at the moment it does not work to input vectors)
  • Monitor the progress: With a @show error in your training loop you can print the current loss and verify that it actually decreases, just a bit slow.
  • Use functions: This is just a simple example, of course, but wrapping your training code inside one big functions makes it considerably faster (at least if you dont @show at every iteration), because globals tend to be slow in Julia (see also here).

Hope that helps :slight_smile:

3 Likes

Hi @RSH. Welcome to the julia discourse!

I haven’t actually run your code but a couple of general comments to add to @Sevi 's useful answer. If you’re going to code derivatives by hand it’s very important to unit test all your derivatives. Looking at the whole code all at once, even for a small example like this one, there are many things that could lead your model fitting to either not converge at all, to converge too slowly, or to converge to the wrong value.

If you’re having trouble fitting data with your model and you don’t have unit tests for your gradients, it’s very hard to know if

a. Your gradients are wrong
b. Your model just isn’t capable of fitting that data
c. The above two items are fine but there’s a problem with your optimization algorithm
d. Everything’s fine but the convergence is just really slow—@sevi’s post suggests this could be the problem here.

See this blog post for some general comments on testing gradient calculations by comparison with finite differences. An important point not mentioned in the post is that ideally you shouldn’t just test against finite differences for a single perturbation size but test that your finite difference calculation converges to the true value at the correct rate, assuming your perturbation isn’t so small that rounding errors start ruining the convergence.

One more thing: I assume you’re coding the derivatives yourself for educational purposes. If that’s not the case, you should really consider using automatic differentiation when working with neural network models—and just generally, when you want to differentiate mathematical code. Even if you are coding the derivatives manually for your own education, it’s probably worth comparing your fully manual code/results with what you get from an autodiff engine.

3 Likes