Taking gradient to update a Flux.jl CNN

Some suggestions: monitor the loss and check that it is decreasing while training; use logitcrossentropy instead of crossentropy and remove the softmax from the model; play with the learning rate.

This is a simplified example showing the loss going properly down:

using Flux, Optimisers
using Statistics

# Defining a model
model = Chain(
    Conv((5, 5), 3 => 8, relu),         # 128×128×3 -> 124×124×8
    MaxPool((2, 2)),                    # -> 62×62×8
    Conv((5, 5), 8 => 1, pad=1, relu),  # -> 60×60×1
    MaxPool((4, 4)),                    # -> 15×15×1
    Flux.flatten,
    Dense(225 => 64),
    Dense(64 => 32),
    Dense(32 => 2),
)

loss(model, X, y) = Flux.logitcrossentropy(model(X), y)
accuracy(model, X, y) = mean(Flux.onecold(model(X)) .== Flux.onecold(y))

opt_state = Flux.setup(Optimisers.Adam(eta=1e-4), model)

batch_size = 32
X = randn(Float32, 128, 128, 3, batch_size)
labels = rand(1:2, batch_size)
y = Flux.onehotbatch(labels, 1:2)

for epoch = 1:10
    train_loss, grad = Flux.withgradient(m -> loss(m, X, y), model)
    # Update the model
    Flux.update!(opt_state, model, grad[1])
    @info epoch accuracy(model, X, y) train_loss
end 

Output:

┌ Info: 1
│   accuracy(model, X, y) = 0.75
└   train_loss = 0.6162462f0
┌ Info: 2
│   accuracy(model, X, y) = 0.75
└   train_loss = 0.6135477f0
┌ Info: 3
│   accuracy(model, X, y) = 0.78125
└   train_loss = 0.6074496f0
┌ Info: 4
│   accuracy(model, X, y) = 0.78125
└   train_loss = 0.59843326f0
┌ Info: 5
│   accuracy(model, X, y) = 0.8125
└   train_loss = 0.5873677f0
┌ Info: 6
│   accuracy(model, X, y) = 0.8125
└   train_loss = 0.57528824f0
┌ Info: 7
│   accuracy(model, X, y) = 0.84375
└   train_loss = 0.5630218f0
┌ Info: 8
│   accuracy(model, X, y) = 0.84375
└   train_loss = 0.5513245f0
┌ Info: 9
│   accuracy(model, X, y) = 0.875
└   train_loss = 0.5406903f0
┌ Info: 10
│   accuracy(model, X, y) = 0.875
└   train_loss = 0.5314034f0
1 Like