Flux.Chain vs expand everything in a function

They are not exactly the same, because the second example immediately discards the 3 Dense layers after f returns. If you constructed them beforehand, passed them into the function and passed their parameters when taking a gradient, then the behaviour will be the same. You’ll also have re-invented most of the functionality of Chain!