Has anyone noticed that
m = Chain(Dense(5, 2), Dense(2, 2) Dense(2, 5))
produces an entirely result than
m = Chain(x -> Dense(5, 2)(x), Dense(2, 2) Dense(2, 5))
are anonymous functions somehow not supported for Chain?
Also, I do not mean different as in like they randomly generated value when initialized are different, I mean like when you seed the random number generator and try to train one versus the other, the anonymous function version simply not only fails to produce the same output but even after more training just does not accomplish any kind of meaningful fitting of the data.
For example, try replacing in any model a Dense layer with a layer written as x -> Dense(…)(x) and see if your model works as it did before.