Hello, I am new to Julia and Flux.jl, and I’ve been working on some toy problems to learn. I have been struggling to get my networks to train – I have yet to get Flux.train!
to work, so I’ve been writing my gradient descent loops manually. I got a model working for MNIST, and now I’m writing one to categorize images of cats and dogs. I’ve spent way too many hours trying to figure out why this isn’t working. I’ve rooted out all the errors, but my training loop still totally fails at minimizing loss. As far as I can tell based on the raw output of the model (softmax), the network gets more and more confident in its predictions, but its predictions remain nearly random.
My first thought was that I was giving incorrect labels to the gradient function or something. I have carefully investigated that possibility and it doesn’t seem to be the issue. I think the problem must be with my calls to gradient
and update!
. Even so, I’ve tried to cut my code down (I took out the data augmentation, which is what I was trying to practice in the first place, haha) and post it all, just in case it’s some other dumb thing I’m doing that’s causing this mess.
This first chunk of code just imports everything, including packages and the dataset containing various images of cats and dogs, which is downloaded onto my machine from Kaggle. I’m pretty confident it’s running fine.
using FileIO
using Images
using Plots
using Augmentor
using Random
using Flux
using Flux: OneHotArrays
using Flux: onecold
using Optimisers
using Statistics
using Flux: crossentropy
# Set defaults to display images without junk
default(showaxis = false)
default(grid = false)
# Locate the files
cat_dir = "augmentation/cats/"
dog_dir = "augmentation/dogs/"
cat_files = readdir(cat_dir)
dog_files = readdir(dog_dir)
# Combine cats and dogs into a list of file locations and a list of labels
data_files = vcat(cat_dir .* cat_files, dog_dir .* dog_files)
possible_labels = [:cat, :dog]
data_labels = Flux.onehotbatch(
vcat(fill(:cat, length(cat_files)), fill(:dog, length(dog_files))),
possible_labels
)
N_data = length(data_files)
# Import images of cats and dogs as matrices
img_dims = (128, 128) # All images will be rescaled to this dimension
# The last dimension of this array is the one that corresponds to the data_labels.
# This formatting is necessary for insertion into the neural network
data_images = Array{Float32, 4}(undef, img_dims..., 3, N_data)
# Populate array instantiated above
for (index, fileloc) in zip(1:N_data, data_files)
img_as_rgb = FileIO.load(fileloc)
img_as_resized_rgb = imresize(img_as_rgb, img_dims)
img_as_channels = convert(
Array{Float32, 3}, # rows × cols × channels
permutedims(channelview(img_as_resized_rgb), (2, 3, 1))
)
data_images[:, :, :, index] = img_as_channels
end
# Train-Test split
randomized_indices = shuffle(1:N_data)
train_test_split_index = 7 * N_data ÷ 10 # 7/10 of the set is train
# Indices of training data and testing data respectively
train_indices = randomized_indices[1:train_test_split_index]
test_indices = randomized_indices[train_test_split_index+1:end]
# Create the datasets for training and testing
train_images = data_images[:, :, :, train_indices]
train_labels = data_labels[:, train_indices]
test_images = data_images[:, :, :, test_indices]
test_labels = data_labels[:, test_indices]
What I really can’t manage to get working is this chunk. I would love to use Flux.train!
but I can’t get it working, so I’m doing the training loop explicitly as best I can.
# Defining a model
model = Chain(
Conv((5, 5), 3 => 8, relu), # 128×128×3 -> 124×124×8
MaxPool((2, 2)), # -> 62×62×8
Conv((5, 5), 8 => 1, pad=1, relu), # -> 60×60×1
MaxPool((4, 4)), # -> 15×15×1
Flux.flatten,
Dense(225 => 64),
Dense(64 => 32),
Dense(32 => 2),
softmax
)
loss(model, X, y) = sum(crossentropy(model(X), y))
opt_state = Flux.setup(Optimisers.Adam(), model)
accuracy(X, y) = mean(onecold(model(X), possible_labels) .== onecold(y, possible_labels))
epochs = 10
for epoch = 1:epochs
for index in range(1, size(train_images)[4])
# Get a picture and label to train with
picture = train_images[:,:,:,index]
label = train_labels[:,index]
# Take the gradient
grad = gradient(loss, model, reshape(picture, (img_dims..., 3, 1)), label)[1]
# Update the model
Flux.update!(opt_state, model, grad)
end
plot()
@show accuracy(reshape(test_images, (img_dims..., 3, :)), test_labels)
end
Why might the model not be converging properly? I’ve read all the docs I can find, but I’m pretty new to deep learning so the issue still might be something really simple. I would love to understand what’s going on and get this working so I can use Julia more in my work!