Hi folks,
Background: I am having a bit of trouble training a deep neural network. My data is comprised of grayscale images with a dimension of 128x128. For training data, I have 128x128x5116 images and for testing data I have 128x128x20484 images.
Goal: With each of these images, I am supposed to predict a label from amongst four potential labels (i.e. 1, 2, 3, 4).
Problem: I am trying to tinker with my learning rates and optimizing the neural network I am building but seem to continue getting the exact same accuracies for both my training and test sets. The loss seems to be changing ever so slightly but not hugely. I feel like I am doing something wrong with my optimization steps. Could someone take a look at my code and see if I am doing something “wrong”?
Edit 1: upon further scrutiny, I feel it has something to do with how I am computing accuracy. Am I doing something wrong there specifically?
Edit 2: nevermind, I think that is working correctly so I am unsure why I always get an accuracy of 25% for both testing and training.
Code:
Defining my small neural network:
mri_neural_network = Chain(
Dense(128^2 => 32, relu),
Dense(32 => 4, relu),
softmax
)
Then I define some helper functions:
"""
Convenience function to have data altogether in one place for training.
"""
function simple_loader(data; batchsize::Int=64)
x2dim = reshape(data.features, 128^2, :)
yhot = Flux.onehotbatch(data.targets, 0:3)
Flux.DataLoader((x2dim, yhot); batchsize, shuffle=true)
end
"""
Calculates accuracy for a given model
"""
function simple_accuracy(model, data)
(x, y) = only(simple_loader(data; batchsize=length(data.targets)))
y_hat = model(x)
iscorrect = Flux.onecold(y_hat) .== Flux.onecold(y)
acc = round(100 * mean(iscorrect); digits=2)
end
Then I define the descent optimizer I want to use during training optimization:
learning_rate = 1
optimizer = Descent
mri_optim = Flux.setup(
optimizer(learning_rate),
mri_neural_network
);
Let the model train:
epochs = 20
losses = []
train_accs = []
test_accs = []
train_loader = simple_loader(training)
for epoch in 1:epochs
model_loss = 0.0
for (x, y) in train_loader
curr_loss, gradients = Flux.withgradient(m -> Flux.crossentropy(m(x), y), mri_neural_network)
Flux.update!(mri_optim, mri_neural_network, gradients[1])
model_loss += curr_loss / length(train_loader)
end
train_acc = simple_accuracy(mri_neural_network, training)
test_acc = simple_accuracy(mri_neural_network, testing)
push!(losses, model_loss)
@info "After epoch = $epoch" model_loss train_acc test_acc
end
Output Example: Here’s an example output of what I am seeing:
┌ Info: After epoch = 1
│ model_loss = 1.3862956166267395
│ train_acc = 25.0
â”” test_acc = 25.0
┌ Info: After epoch = 2
│ model_loss = 1.3862956166267395
│ train_acc = 25.0
â”” test_acc = 25.0
┌ Info: After epoch = 3
│ model_loss = 1.3862956166267395
│ train_acc = 25.0
â”” test_acc = 25.0
Additional Notes: Just to have as an example, here is what the training data (data.features
and data.targets
) looks like in the code:
julia> training.features
128Ă—128Ă—20484 Array{Float32, 3}:
[:, :, 1] =
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
⋮ ⋮ ⋱ ⋮
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
julia> training.targets
20484-element Vector{Int32}:
0
0
0
0
0
â‹®
3
3
3
3
3
Any ideas about what I could be doing wrong? Any more information I could provide?
Cheers!
~ tcp