Taking gradient to update a Flux.jl CNN

Hello, I am new to Julia and Flux.jl, and I’ve been working on some toy problems to learn. I have been struggling to get my networks to train – I have yet to get Flux.train! to work, so I’ve been writing my gradient descent loops manually. I got a model working for MNIST, and now I’m writing one to categorize images of cats and dogs. I’ve spent way too many hours trying to figure out why this isn’t working. I’ve rooted out all the errors, but my training loop still totally fails at minimizing loss. As far as I can tell based on the raw output of the model (softmax), the network gets more and more confident in its predictions, but its predictions remain nearly random.

My first thought was that I was giving incorrect labels to the gradient function or something. I have carefully investigated that possibility and it doesn’t seem to be the issue. I think the problem must be with my calls to gradient and update!. Even so, I’ve tried to cut my code down (I took out the data augmentation, which is what I was trying to practice in the first place, haha) and post it all, just in case it’s some other dumb thing I’m doing that’s causing this mess.

This first chunk of code just imports everything, including packages and the dataset containing various images of cats and dogs, which is downloaded onto my machine from Kaggle. I’m pretty confident it’s running fine.

using FileIO
using Images
using Plots
using Augmentor
using Random
using Flux
using Flux: OneHotArrays
using Flux: onecold
using Optimisers
using Statistics
using Flux: crossentropy

# Set defaults to display images without junk
default(showaxis = false)
default(grid = false)

# Locate the files
cat_dir = "augmentation/cats/"
dog_dir = "augmentation/dogs/"
cat_files = readdir(cat_dir)
dog_files = readdir(dog_dir)

# Combine cats and dogs into a list of file locations and a list of labels
data_files  = vcat(cat_dir .* cat_files, dog_dir .* dog_files)
possible_labels = [:cat, :dog]
data_labels = Flux.onehotbatch(
    vcat(fill(:cat, length(cat_files)), fill(:dog, length(dog_files))),
    possible_labels
)
N_data = length(data_files)

# Import images of cats and dogs as matrices
img_dims = (128, 128) # All images will be rescaled to this dimension
# The last dimension of this array is the one that corresponds to the data_labels.
# This formatting is necessary for insertion into the neural network
data_images = Array{Float32, 4}(undef, img_dims..., 3, N_data)

# Populate array instantiated above
for (index, fileloc) in zip(1:N_data, data_files)
    img_as_rgb = FileIO.load(fileloc)
    img_as_resized_rgb = imresize(img_as_rgb, img_dims)
    img_as_channels = convert(
        Array{Float32, 3}, # rows × cols × channels
        permutedims(channelview(img_as_resized_rgb), (2, 3, 1))
    )
    data_images[:, :, :, index] = img_as_channels
end

# Train-Test split
randomized_indices = shuffle(1:N_data)
train_test_split_index = 7 * N_data ÷ 10 # 7/10 of the set is train
# Indices of training data and testing data respectively
train_indices = randomized_indices[1:train_test_split_index]
test_indices = randomized_indices[train_test_split_index+1:end]


# Create the datasets for training and testing
train_images = data_images[:, :, :, train_indices]
train_labels = data_labels[:, train_indices]
test_images = data_images[:, :, :, test_indices]
test_labels = data_labels[:, test_indices]

What I really can’t manage to get working is this chunk. I would love to use Flux.train! but I can’t get it working, so I’m doing the training loop explicitly as best I can.

# Defining a model
model = Chain(
    Conv((5, 5), 3 => 8, relu),         # 128×128×3 -> 124×124×8
    MaxPool((2, 2)),                    # -> 62×62×8
    Conv((5, 5), 8 => 1, pad=1, relu),  # -> 60×60×1
    MaxPool((4, 4)),                    # -> 15×15×1
    Flux.flatten,
    Dense(225 => 64),
    Dense(64 => 32),
    Dense(32 => 2),
    softmax
)

loss(model, X, y) = sum(crossentropy(model(X), y))
opt_state = Flux.setup(Optimisers.Adam(), model)
accuracy(X, y) = mean(onecold(model(X), possible_labels) .== onecold(y, possible_labels))

epochs = 10
for epoch = 1:epochs
    for index in range(1, size(train_images)[4])
        # Get a picture and label to train with
        picture = train_images[:,:,:,index]
        label = train_labels[:,index]

        # Take the gradient
        grad = gradient(loss, model, reshape(picture, (img_dims..., 3, 1)), label)[1]
        # Update the model
        Flux.update!(opt_state, model, grad)
    end
    plot()
    @show accuracy(reshape(test_images, (img_dims..., 3, :)), test_labels)
end 

Why might the model not be converging properly? I’ve read all the docs I can find, but I’m pretty new to deep learning so the issue still might be something really simple. I would love to understand what’s going on and get this working so I can use Julia more in my work!

Some suggestions: monitor the loss and check that it is decreasing while training; use logitcrossentropy instead of crossentropy and remove the softmax from the model; play with the learning rate.

This is a simplified example showing the loss going properly down:

using Flux, Optimisers
using Statistics

# Defining a model
model = Chain(
    Conv((5, 5), 3 => 8, relu),         # 128×128×3 -> 124×124×8
    MaxPool((2, 2)),                    # -> 62×62×8
    Conv((5, 5), 8 => 1, pad=1, relu),  # -> 60×60×1
    MaxPool((4, 4)),                    # -> 15×15×1
    Flux.flatten,
    Dense(225 => 64),
    Dense(64 => 32),
    Dense(32 => 2),
)

loss(model, X, y) = Flux.logitcrossentropy(model(X), y)
accuracy(model, X, y) = mean(Flux.onecold(model(X)) .== Flux.onecold(y))

opt_state = Flux.setup(Optimisers.Adam(eta=1e-4), model)

batch_size = 32
X = randn(Float32, 128, 128, 3, batch_size)
labels = rand(1:2, batch_size)
y = Flux.onehotbatch(labels, 1:2)

for epoch = 1:10
    train_loss, grad = Flux.withgradient(m -> loss(m, X, y), model)
    # Update the model
    Flux.update!(opt_state, model, grad[1])
    @info epoch accuracy(model, X, y) train_loss
end 

Output:

┌ Info: 1
│   accuracy(model, X, y) = 0.75
└   train_loss = 0.6162462f0
┌ Info: 2
│   accuracy(model, X, y) = 0.75
└   train_loss = 0.6135477f0
┌ Info: 3
│   accuracy(model, X, y) = 0.78125
└   train_loss = 0.6074496f0
┌ Info: 4
│   accuracy(model, X, y) = 0.78125
└   train_loss = 0.59843326f0
┌ Info: 5
│   accuracy(model, X, y) = 0.8125
└   train_loss = 0.5873677f0
┌ Info: 6
│   accuracy(model, X, y) = 0.8125
└   train_loss = 0.57528824f0
┌ Info: 7
│   accuracy(model, X, y) = 0.84375
└   train_loss = 0.5630218f0
┌ Info: 8
│   accuracy(model, X, y) = 0.84375
└   train_loss = 0.5513245f0
┌ Info: 9
│   accuracy(model, X, y) = 0.875
└   train_loss = 0.5406903f0
┌ Info: 10
│   accuracy(model, X, y) = 0.875
└   train_loss = 0.5314034f0
1 Like

Thank you for your suggestions!