`Chain` vs `foldl` in Flux.jl

I am new to both ML and Julia, so this might be a silly question. I am trying to use foldl instead of Chain to create a model with multiple layers, but I am not able to train the model created using foldl. Does anyone know what I am doing wrong?

I have a Jupyter notebook with the following cells:

Cell 1: Setting up the data - create two annular clouds of points to classify

using Plots, Interact, Flux
using Flux: mse, shuffle, throttle

function random_circular_coordinates(radius, num_points, σ=0.1)
    # angles evenly spaced around unit circle
    angles = range(0; stop=(2 * π), length=num_points)

    # randomness added based on the Normal distribution
    points = radius .+ σ * randn(num_points)

    # random coordinates "wrapped" around unit circle
    coordinates = [points .* cos.(angles) points .* sin.(angles)]
    return permutedims(coordinates)

num_points = 100
radius_1 = 2
circle_1 = random_circular_coordinates(radius_1, num_points)
radius_2 = 0.5
circle_2 = random_circular_coordinates(radius_2, num_points)

X_train = [circle_1 circle_2]
y1, y2 = -1, 1
Y_train_column = [fill(y1, num_points); fill(y2, num_points)]
Y_train = permutedims(Y_train_column)
τ = 0.0

@show size(X_train)
@show size(Y_train)

iters = 1000
dataset = ((X_train, Y_train) for _ in 1:iters)
opt = ADAM()

Cell 2: Create a model using Chain and train it.

m1 = Chain( Dense(size(X_train,1), 32, relu),
            Dense(32, 1) )
loss1(x, y) = mse(m1(x), y) 
evalcb1() = @show(loss1(X_train, Y_train))
Flux.train!(loss1, params(m1), dataset, opt; cb=throttle(evalcb1, 0.01))

Cell 3: Create the same model using foldl and train it.

layers = [  Dense(size(X_train,1), 32, relu),
            Dense(32, 1) ]
m2(x) = foldl((x, m2) -> m2(x), layers, init = x)
loss2(x, y) = mse(m2(x), y) 
evalcb2() = @show(loss2(X_train, Y_train))
Flux.train!(loss2, params(m2), dataset, opt; cb=throttle(evalcb2, 0.01))

The Chain approach seems to work, but the foldl approach doesn’t.

Does anyone know what I am doing wrong?

This should be Flux.train!(loss2, params(layers[1], layers[2]), dataset, opt; cb=throttle(evalcb2, 0.01)). m2 is just a function and not a model struct or array, hence calling params on it returns nothing. In contrast, m1 is a model struct because its type (Chain) implements the functor interface that Flux uses to deconstruct and retrieve parameter arrays.

Thanks a lot! Yes, that worked; in fact, the following works (where I put in the array of layers as the argument to params):