Possible anonymous function bug in Flux

Has anyone noticed that

m = Chain(Dense(5, 2), Dense(2, 2) Dense(2, 5))

produces an entirely result than
m = Chain(x -> Dense(5, 2)(x), Dense(2, 2) Dense(2, 5))
are anonymous functions somehow not supported for Chain?

Also, I do not mean different as in like they randomly generated value when initialized are different, I mean like when you seed the random number generator and try to train one versus the other, the anonymous function version simply not only fails to produce the same output but even after more training just does not accomplish any kind of meaningful fitting of the data.

For example, try replacing in any model a Dense layer with a layer written as x -> Dense(…)(x) and see if your model works as it did before.

params can’t find the parameters inside the dense layer used in the anonymous function. You need to manually add those parameters to the set of parameters being trained

1 Like

How exactly would you do that? Are you saying I need to store params(layer1) and have that as an input to the train! function if I were to use an anonymous function?

You call params(m) right? You need to call params on the dense layer in the anonymous function as well and concatenate the twob

1 Like

Any trainable parameters closed over by an anonymous function must be manually extracted and added to the params sent to train. If you don’t want to do this, define a callable struct and run @treelike on it, this is how Dense is defined

1 Like

Awesome thank you. That is very helpful. Especially that I can just concatenate them.

One thing: it looks like that in order to concatenate the two, I would need to call collect() in order to access that Array as otherwise the return type of params() is Params(). Once I call collect, I can vcat() the arrays but I’m not sure how to add the Params type back, is there a way to do this?

Try something like

params((m, layer))

1 Like

that works!
and is a lot cleaner than the
Params(vcat(collect(params(m)), collect(params(layer))))
I was using

1 Like

how is layer written for the given example, i.e.

Meaning, the layer that params() does not understand is the anonymous function, but I guess you do not substute the anonymous function for layer, like
params((m, x -> Dense(5, 2)(x))