How do I regularise MLJFlux models?

Reposting this question from a Slack channel.

The question of regularization for neural networks is a bit
complicated, and I’m no expert. It has frequently been observed that
increasing the total number of weights (increasing complexity) does
not necessarily lead to over-fitting, but the phenomenon is poorly
understood in general. In this paper Belkin et
al. (2019) introduce specific examples of networks where a “double
descent risk curve” should be expected (so no over-fitting). However,
in this preprint Nicahni et al. (2020) argue that while increasing network width may not
lead to over-fitting, increasing depth can still lead to over-fitting.

Returning to the question, regularization options in MLJFlux/Flux are:

  • Early stopping: You end training when an out-of-sample error begins
    to deteriorate. MLJ’s IteratedModel wrapper is useful for
    automating this. See, the Boston or MNIST examples

  • Add Dropout layers to your Flux model (aka chain) (through the
    builder hyper-parameter of your flux model). See Normalization &

    section of the Flux manual

  • Add L1/L2 weight penalty regularization by specifying appropriate
    values of the hyper-parameters lambda (strength of regularization)
    and alpha of your MLJFlux model (L2/L1 mix). If alpha=0 then there is only L2