So I thought I’d take the Universal Approximation Theorem for a spin by training a neural net to reproduce the U.S. Consumer Price Index as a function of time:
A simple model with 1 hidden layer with 3 neurons does just OK, as expected. I thought I’d increase the number of neurons and hidden layers and watch the approximation get closer and closer to the actual data, probably through overfitting.
Instead, the opposite seems to have happened. Increasing the number of neurons and hidden layers didn’t make the output look any different, until at some point the model just produced a line that was almost entirely flat.
I was expecting the output to look more wrinkly, at the very least, but instead the opposite happened.
Anyone have guesses on what’s going on, or how to “debug” this sort of thing? I suppose I could try to find a pre-built package that’s designed for this sort of thing, but I’d like to improve my understanding of how to architect/design neural networks for various problems and not just get a solution to this one specific problem.
It might be a problem with the initialization, i.e., the σ activation function easily saturates if its inputs are too large. Did a quick check using relu which seemed to train fine also with 300 hidden neurons.
In any case, I good test is to plot the input-output function of the untrained network. In the example, with σ activations, I got an almost constant function with randomly initialized weights.