Reposting this question from a Slack channel.
The question of regularization for neural networks is a bit
complicated, and I’m no expert. It has frequently been observed that
increasing the total number of weights (increasing complexity) does
not necessarily lead to over-fitting, but the phenomenon is poorly
understood in general. In this paper Belkin et
al. (2019) introduce specific examples of networks where a “double
descent risk curve” should be expected (so no over-fitting). However,
in this preprint Nicahni et al. (2020) argue that while increasing network width may not
lead to over-fitting, increasing depth can still lead to over-fitting.
Returning to the question, regularization options in MLJFlux/Flux are:
- 
Early stopping: You end training when an out-of-sample error begins 
 to deteriorate. MLJ’sIteratedModelwrapper is useful for
 automating this. See, the Boston or MNIST examples
 here.
- 
Add Dropoutlayers to your Flux model (aka chain) (through the
 builderhyper-parameter of your flux model). See Normalization &
 Regularization
 section of the Flux manual
- 
Add L1/L2 weight penalty regularization by specifying appropriate 
 values of the hyper-parameterslambda(strength of regularization)
 andalphaof your MLJFlux model (L2/L1 mix). Ifalpha=0then there is only L2
 regularization