It looks like the activations() function (in basic.jl) computes the forward pass
to obtain the activations, which would mean that it is computed twice.
I guess a fix for this would be to implement a custom alternative to Chain()
that propagates both the prediction and a list of activations
(or, the regularization term involving the activations)
?