Misbehaving model / bad neural net architecture for the job?

chrisoei · November 3, 2022, 5:46pm

So I thought I’d take the Universal Approximation Theorem for a spin by training a neural net to reproduce the U.S. Consumer Price Index as a function of time:

gist.github.com

https://gist.github.com/chrisoei/7e12474e069f6641267d234364d979a8

cpi_learning_test.jl

# Training data available (for free) at the Federal Reserve website:
#   https://fred.stlouisfed.org/series/CPIAUCNS
#
# According to the Universal Approximation Theorem
# (https://en.wikipedia.org/wiki/Universal_approximation_theorem),
# a neural network should be able to approximate the CPI
# as a function of time with arbitrary accuracy.

import CSV
import Flux

This file has been truncated. show original

A simple model with 1 hidden layer with 3 neurons does just OK, as expected. I thought I’d increase the number of neurons and hidden layers and watch the approximation get closer and closer to the actual data, probably through overfitting.

Instead, the opposite seems to have happened. Increasing the number of neurons and hidden layers didn’t make the output look any different, until at some point the model just produced a line that was almost entirely flat.

I was expecting the output to look more wrinkly, at the very least, but instead the opposite happened.

Anyone have guesses on what’s going on, or how to “debug” this sort of thing? I suppose I could try to find a pre-built package that’s designed for this sort of thing, but I’d like to improve my understanding of how to architect/design neural networks for various problems and not just get a solution to this one specific problem.

bertschi · November 3, 2022, 6:56pm

It might be a problem with the initialization, i.e., the σ activation function easily saturates if its inputs are too large. Did a quick check using relu which seemed to train fine also with 300 hidden neurons.
In any case, I good test is to plot the input-output function of the untrained network. In the example, with σ activations, I got an almost constant function with randomly initialized weights.

chrisoei · November 5, 2022, 3:12pm

Thanks! Using relu fixed the main problem. Decreasing the batch size to something sane also helped. The final result:

c8927dd3c5490f5dd525e6c2bc99b35b4d7fb073a1c15572b634c08982d96f7f

The code, for anyone else interested in doing this sort of thing: https://www.christopheroei.com/b/ad8197e069984d094bb771e8f73545287087219a6da1509941480e05cf0b4e96.jl

Topic		Replies	Views
Approximating a Quadratic Function with Flux Machine Learning	6	1642	May 30, 2019
Why does my neural net not learn Machine Learning question , flux	3	491	November 2, 2021
Approximating A+B*log with NeuralNetwork Numerics	0	418	November 24, 2020
Very simple Flux model refusing to converge General Usage	3	414	July 27, 2023
Problems with Flux Machine Learning	2	1610	March 14, 2018

Misbehaving model / bad neural net architecture for the job?

Related topics