Reposting this question from a Slack channel.
The question of regularization for neural networks is a bit
complicated, and I’m no expert. It has frequently been observed that
increasing the total number of weights (increasing complexity) does
not necessarily lead to over-fitting, but the phenomenon is poorly
understood in general. In this paper Belkin et
al. (2019) introduce specific examples of networks where a “double
descent risk curve” should be expected (so no over-fitting). However,
in this preprint Nicahni et al. (2020) argue that while increasing network width may not
lead to over-fitting, increasing depth can still lead to over-fitting.
Returning to the question, regularization options in MLJFlux/Flux are:
-
Early stopping: You end training when an out-of-sample error begins
to deteriorate. MLJ’sIteratedModel
wrapper is useful for
automating this. See, the Boston or MNIST examples
here. -
Add
Dropout
layers to your Flux model (aka chain) (through the
builder
hyper-parameter of your flux model). See Normalization &
Regularization
section of the Flux manual -
Add L1/L2 weight penalty regularization by specifying appropriate
values of the hyper-parameterslambda
(strength of regularization)
andalpha
of your MLJFlux model (L2/L1 mix). Ifalpha=0
then there is only L2
regularization