Misbehaving model / bad neural net architecture for the job?

So I thought I’d take the Universal Approximation Theorem for a spin by training a neural net to reproduce the U.S. Consumer Price Index as a function of time:

A simple model with 1 hidden layer with 3 neurons does just OK, as expected. I thought I’d increase the number of neurons and hidden layers and watch the approximation get closer and closer to the actual data, probably through overfitting.

Instead, the opposite seems to have happened. Increasing the number of neurons and hidden layers didn’t make the output look any different, until at some point the model just produced a line that was almost entirely flat.

I was expecting the output to look more wrinkly, at the very least, but instead the opposite happened.

Anyone have guesses on what’s going on, or how to “debug” this sort of thing? I suppose I could try to find a pre-built package that’s designed for this sort of thing, but I’d like to improve my understanding of how to architect/design neural networks for various problems and not just get a solution to this one specific problem.

1 Like

It might be a problem with the initialization, i.e., the σ activation function easily saturates if its inputs are too large. Did a quick check using relu which seemed to train fine also with 300 hidden neurons.
In any case, I good test is to plot the input-output function of the untrained network. In the example, with σ activations, I got an almost constant function with randomly initialized weights.

2 Likes

Thanks! Using relu fixed the main problem. Decreasing the batch size to something sane also helped. The final result:

c8927dd3c5490f5dd525e6c2bc99b35b4d7fb073a1c15572b634c08982d96f7f

The code, for anyone else interested in doing this sort of thing: https://www.christopheroei.com/b/ad8197e069984d094bb771e8f73545287087219a6da1509941480e05cf0b4e96.jl

1 Like