They are not exactly the same, because the second example immediately discards the 3 Dense
layers after f
returns. If you constructed them beforehand, passed them into the function and passed their parameters when taking a gradient, then the behaviour will be the same. You’ll also have re-invented most of the functionality of Chain
!