Taking gradient to update a Flux.jl CNN

mward19 · May 4, 2024, 1:05am

Hello, I am new to Julia and Flux.jl, and I’ve been working on some toy problems to learn. I have been struggling to get my networks to train – I have yet to get Flux.train! to work, so I’ve been writing my gradient descent loops manually. I got a model working for MNIST, and now I’m writing one to categorize images of cats and dogs. I’ve spent way too many hours trying to figure out why this isn’t working. I’ve rooted out all the errors, but my training loop still totally fails at minimizing loss. As far as I can tell based on the raw output of the model (softmax), the network gets more and more confident in its predictions, but its predictions remain nearly random.

My first thought was that I was giving incorrect labels to the gradient function or something. I have carefully investigated that possibility and it doesn’t seem to be the issue. I think the problem must be with my calls to gradient and update!. Even so, I’ve tried to cut my code down (I took out the data augmentation, which is what I was trying to practice in the first place, haha) and post it all, just in case it’s some other dumb thing I’m doing that’s causing this mess.

This first chunk of code just imports everything, including packages and the dataset containing various images of cats and dogs, which is downloaded onto my machine from Kaggle. I’m pretty confident it’s running fine.

using FileIO
using Images
using Plots
using Augmentor
using Random
using Flux
using Flux: OneHotArrays
using Flux: onecold
using Optimisers
using Statistics
using Flux: crossentropy

# Set defaults to display images without junk
default(showaxis = false)
default(grid = false)

# Locate the files
cat_dir = "augmentation/cats/"
dog_dir = "augmentation/dogs/"
cat_files = readdir(cat_dir)
dog_files = readdir(dog_dir)

# Combine cats and dogs into a list of file locations and a list of labels
data_files  = vcat(cat_dir .* cat_files, dog_dir .* dog_files)
possible_labels = [:cat, :dog]
data_labels = Flux.onehotbatch(
    vcat(fill(:cat, length(cat_files)), fill(:dog, length(dog_files))),
    possible_labels
)
N_data = length(data_files)

# Import images of cats and dogs as matrices
img_dims = (128, 128) # All images will be rescaled to this dimension
# The last dimension of this array is the one that corresponds to the data_labels.
# This formatting is necessary for insertion into the neural network
data_images = Array{Float32, 4}(undef, img_dims..., 3, N_data)

# Populate array instantiated above
for (index, fileloc) in zip(1:N_data, data_files)
    img_as_rgb = FileIO.load(fileloc)
    img_as_resized_rgb = imresize(img_as_rgb, img_dims)
    img_as_channels = convert(
        Array{Float32, 3}, # rows × cols × channels
        permutedims(channelview(img_as_resized_rgb), (2, 3, 1))
    )
    data_images[:, :, :, index] = img_as_channels
end

# Train-Test split
randomized_indices = shuffle(1:N_data)
train_test_split_index = 7 * N_data ÷ 10 # 7/10 of the set is train
# Indices of training data and testing data respectively
train_indices = randomized_indices[1:train_test_split_index]
test_indices = randomized_indices[train_test_split_index+1:end]


# Create the datasets for training and testing
train_images = data_images[:, :, :, train_indices]
train_labels = data_labels[:, train_indices]
test_images = data_images[:, :, :, test_indices]
test_labels = data_labels[:, test_indices]

What I really can’t manage to get working is this chunk. I would love to use Flux.train! but I can’t get it working, so I’m doing the training loop explicitly as best I can.

# Defining a model
model = Chain(
    Conv((5, 5), 3 => 8, relu),         # 128×128×3 -> 124×124×8
    MaxPool((2, 2)),                    # -> 62×62×8
    Conv((5, 5), 8 => 1, pad=1, relu),  # -> 60×60×1
    MaxPool((4, 4)),                    # -> 15×15×1
    Flux.flatten,
    Dense(225 => 64),
    Dense(64 => 32),
    Dense(32 => 2),
    softmax
)

loss(model, X, y) = sum(crossentropy(model(X), y))
opt_state = Flux.setup(Optimisers.Adam(), model)
accuracy(X, y) = mean(onecold(model(X), possible_labels) .== onecold(y, possible_labels))

epochs = 10
for epoch = 1:epochs
    for index in range(1, size(train_images)[4])
        # Get a picture and label to train with
        picture = train_images[:,:,:,index]
        label = train_labels[:,index]

        # Take the gradient
        grad = gradient(loss, model, reshape(picture, (img_dims..., 3, 1)), label)[1]
        # Update the model
        Flux.update!(opt_state, model, grad)
    end
    plot()
    @show accuracy(reshape(test_images, (img_dims..., 3, :)), test_labels)
end

Why might the model not be converging properly? I’ve read all the docs I can find, but I’m pretty new to deep learning so the issue still might be something really simple. I would love to understand what’s going on and get this working so I can use Julia more in my work!

CarloLucibello · May 4, 2024, 10:49am

Some suggestions: monitor the loss and check that it is decreasing while training; use logitcrossentropy instead of crossentropy and remove the softmax from the model; play with the learning rate.

This is a simplified example showing the loss going properly down:

using Flux, Optimisers
using Statistics

# Defining a model
model = Chain(
    Conv((5, 5), 3 => 8, relu),         # 128×128×3 -> 124×124×8
    MaxPool((2, 2)),                    # -> 62×62×8
    Conv((5, 5), 8 => 1, pad=1, relu),  # -> 60×60×1
    MaxPool((4, 4)),                    # -> 15×15×1
    Flux.flatten,
    Dense(225 => 64),
    Dense(64 => 32),
    Dense(32 => 2),
)

loss(model, X, y) = Flux.logitcrossentropy(model(X), y)
accuracy(model, X, y) = mean(Flux.onecold(model(X)) .== Flux.onecold(y))

opt_state = Flux.setup(Optimisers.Adam(eta=1e-4), model)

batch_size = 32
X = randn(Float32, 128, 128, 3, batch_size)
labels = rand(1:2, batch_size)
y = Flux.onehotbatch(labels, 1:2)

for epoch = 1:10
    train_loss, grad = Flux.withgradient(m -> loss(m, X, y), model)
    # Update the model
    Flux.update!(opt_state, model, grad[1])
    @info epoch accuracy(model, X, y) train_loss
end

Output:

┌ Info: 1
│   accuracy(model, X, y) = 0.75
└   train_loss = 0.6162462f0
┌ Info: 2
│   accuracy(model, X, y) = 0.75
└   train_loss = 0.6135477f0
┌ Info: 3
│   accuracy(model, X, y) = 0.78125
└   train_loss = 0.6074496f0
┌ Info: 4
│   accuracy(model, X, y) = 0.78125
└   train_loss = 0.59843326f0
┌ Info: 5
│   accuracy(model, X, y) = 0.8125
└   train_loss = 0.5873677f0
┌ Info: 6
│   accuracy(model, X, y) = 0.8125
└   train_loss = 0.57528824f0
┌ Info: 7
│   accuracy(model, X, y) = 0.84375
└   train_loss = 0.5630218f0
┌ Info: 8
│   accuracy(model, X, y) = 0.84375
└   train_loss = 0.5513245f0
┌ Info: 9
│   accuracy(model, X, y) = 0.875
└   train_loss = 0.5406903f0
┌ Info: 10
│   accuracy(model, X, y) = 0.875
└   train_loss = 0.5314034f0

mward19 · May 4, 2024, 2:21pm

Thank you for your suggestions!

Topic		Replies	Views
Problems using Flux New to Julia	7	426	June 6, 2023
Accuracy issues on Flux Performance question , flux	25	1189	January 3, 2023
FLUX.JL -- MethodError: no method matching loss() Machine Learning question , flux	4	1054	April 24, 2023
Getting gradients with loss using for-loop is slow in Flux.jl Machine Learning	4	263	August 7, 2023
[ANN] Flux v0.10 Machine Learning	36	5293	February 4, 2020

Taking gradient to update a Flux.jl CNN

Related topics