Flux.Chain vs expand everything in a function

Hi, I was trying to understand what the difference is between Flux.Chain vs expanding all operations in a single function.

For instance, given following object:

Chain(Dense(d_in, d_hidden, tanh), Dense(d_hidden, d_hidden, tanh), Dense(d_hidden, d_in,tanh))

if I replace it with

function f(x)
    out = Dense(d_in, d_hidden, tanh)(x)
    out = Dense(d_hidden, d_hidden, tanh)(out)
    out = Dense(d_hidden, d_in,tanh)(out)

Does anyone know whether these two are exactly the same in terms of training/testing behaviors?

Thank you!

They are not exactly the same, because the second example immediately discards the 3 Dense layers after f returns. If you constructed them beforehand, passed them into the function and passed their parameters when taking a gradient, then the behaviour will be the same. You’ll also have re-invented most of the functionality of Chain!