Why isn't Flux Descent learning?

The following code doesn’t cause nearly as much learning as it should.

Basically, I create two groups 0 and 1, each having two points, and want them separated by a sigmoid function, using mean square error as loss.

During training, the loss function isn’t decreasing, as reported by the callback:

using Flux
xs_test = [[0.5, 0.25],[0.5, 0.25],[0.5, 0.5],[0.5, 0.5]]
ys_test = [0, 0, 1, 1]

model = Dense(2, 1, σ)
model.W .= ones(1, 2)
model.b .= ones(1)

loss(x, y) = Flux.mse(model(x), y)

ps = params(model)
data = zip(xs_test, ys_test)
opt = Descent(0.1)


plot(loss.(xs_test, ys_test), label="before training")
Flux.train!(loss, ps, data, opt; cb = () -> println("Current loss: ", sum(loss.(xs_test, ys_test))))

plot!(loss.(xs_test, ys_test), label="after training")

But after training, there is no improvement in separation:

contour(0:.1:1, 0:.1:1, (x, y) -> model([x,y])[1], fill=true)
scatter!(first.(xs_test[1:2]), last.(xs_test[1:2]), label="group 0")
scatter!(first.(xs_test[3:4]), last.(xs_test[3:4]), label="group 1")

It does learn, it’s just that you only run a single epoch. Try

 Flux.@epochs 100 Flux.train!(loss, ps, data, opt; cb = () -> println("Current loss: ", sum(loss.(xs_test, ys_test))))

1 Like