Some suggestions: monitor the loss and check that it is decreasing while training; use logitcrossentropy instead of crossentropy and remove the softmax from the model; play with the learning rate.
This is a simplified example showing the loss going properly down:
using Flux, Optimisers
using Statistics
# Defining a model
model = Chain(
Conv((5, 5), 3 => 8, relu), # 128×128×3 -> 124×124×8
MaxPool((2, 2)), # -> 62×62×8
Conv((5, 5), 8 => 1, pad=1, relu), # -> 60×60×1
MaxPool((4, 4)), # -> 15×15×1
Flux.flatten,
Dense(225 => 64),
Dense(64 => 32),
Dense(32 => 2),
)
loss(model, X, y) = Flux.logitcrossentropy(model(X), y)
accuracy(model, X, y) = mean(Flux.onecold(model(X)) .== Flux.onecold(y))
opt_state = Flux.setup(Optimisers.Adam(eta=1e-4), model)
batch_size = 32
X = randn(Float32, 128, 128, 3, batch_size)
labels = rand(1:2, batch_size)
y = Flux.onehotbatch(labels, 1:2)
for epoch = 1:10
train_loss, grad = Flux.withgradient(m -> loss(m, X, y), model)
# Update the model
Flux.update!(opt_state, model, grad[1])
@info epoch accuracy(model, X, y) train_loss
end
Output:
┌ Info: 1
│ accuracy(model, X, y) = 0.75
└ train_loss = 0.6162462f0
┌ Info: 2
│ accuracy(model, X, y) = 0.75
└ train_loss = 0.6135477f0
┌ Info: 3
│ accuracy(model, X, y) = 0.78125
└ train_loss = 0.6074496f0
┌ Info: 4
│ accuracy(model, X, y) = 0.78125
└ train_loss = 0.59843326f0
┌ Info: 5
│ accuracy(model, X, y) = 0.8125
└ train_loss = 0.5873677f0
┌ Info: 6
│ accuracy(model, X, y) = 0.8125
└ train_loss = 0.57528824f0
┌ Info: 7
│ accuracy(model, X, y) = 0.84375
└ train_loss = 0.5630218f0
┌ Info: 8
│ accuracy(model, X, y) = 0.84375
└ train_loss = 0.5513245f0
┌ Info: 9
│ accuracy(model, X, y) = 0.875
└ train_loss = 0.5406903f0
┌ Info: 10
│ accuracy(model, X, y) = 0.875
└ train_loss = 0.5314034f0