NN Getting Same Training and Testing Accuracies?

TheCedarPrince · March 21, 2024, 9:26pm

Hi folks,

Background: I am having a bit of trouble training a deep neural network. My data is comprised of grayscale images with a dimension of 128x128. For training data, I have 128x128x5116 images and for testing data I have 128x128x20484 images.

Goal: With each of these images, I am supposed to predict a label from amongst four potential labels (i.e. 1, 2, 3, 4).

Problem: I am trying to tinker with my learning rates and optimizing the neural network I am building but seem to continue getting the exact same accuracies for both my training and test sets. The loss seems to be changing ever so slightly but not hugely. I feel like I am doing something wrong with my optimization steps. Could someone take a look at my code and see if I am doing something “wrong”?

Edit 1: upon further scrutiny, I feel it has something to do with how I am computing accuracy. Am I doing something wrong there specifically?

Edit 2: nevermind, I think that is working correctly so I am unsure why I always get an accuracy of 25% for both testing and training.

Code:

Defining my small neural network:

mri_neural_network = Chain(
    Dense(128^2 => 32, relu),
    Dense(32 => 4, relu),
    softmax
)

Then I define some helper functions:

"""
    Convenience function to have data altogether in one place for training.
"""
function simple_loader(data; batchsize::Int=64)
    x2dim = reshape(data.features, 128^2, :)
    yhot = Flux.onehotbatch(data.targets, 0:3)
    Flux.DataLoader((x2dim, yhot); batchsize, shuffle=true)
end

"""
    Calculates accuracy for a given model
"""
function simple_accuracy(model, data)
    (x, y) = only(simple_loader(data; batchsize=length(data.targets)))
    y_hat = model(x)
    iscorrect = Flux.onecold(y_hat) .== Flux.onecold(y)
    acc = round(100 * mean(iscorrect); digits=2)
end

Then I define the descent optimizer I want to use during training optimization:

learning_rate = 1
optimizer = Descent

mri_optim = Flux.setup(
    optimizer(learning_rate), 
    mri_neural_network
);

Let the model train:

epochs = 20
losses = []
train_accs = []
test_accs = []

train_loader = simple_loader(training)

for epoch in 1:epochs
    model_loss = 0.0
    for (x, y) in train_loader
        curr_loss, gradients = Flux.withgradient(m -> Flux.crossentropy(m(x), y), mri_neural_network)
        Flux.update!(mri_optim, mri_neural_network, gradients[1])
        model_loss += curr_loss / length(train_loader)
    end
    train_acc = simple_accuracy(mri_neural_network, training)
    test_acc = simple_accuracy(mri_neural_network, testing)
    push!(losses, model_loss)
    @info "After epoch = $epoch" model_loss train_acc test_acc
end

Output Example: Here’s an example output of what I am seeing:

┌ Info: After epoch = 1
│   model_loss = 1.3862956166267395
│   train_acc = 25.0
└   test_acc = 25.0
┌ Info: After epoch = 2
│   model_loss = 1.3862956166267395
│   train_acc = 25.0
└   test_acc = 25.0
┌ Info: After epoch = 3
│   model_loss = 1.3862956166267395
│   train_acc = 25.0
└   test_acc = 25.0

Additional Notes: Just to have as an example, here is what the training data (data.features and data.targets) looks like in the code:

julia> training.features
128×128×20484 Array{Float32, 3}:
[:, :, 1] =
 0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
 ⋮                        ⋮    ⋱                 ⋮         
 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0


julia> training.targets
20484-element Vector{Int32}:
 0
 0
 0
 0
 0
 ⋮
 3
 3
 3
 3
 3

Any ideas about what I could be doing wrong? Any more information I could provide?

Cheers!

~ tcp

yolhan_mannes · March 21, 2024, 10:09pm

Your loss isn’t moving so I don’t think the accuracy is the issue,
1 - ensure the gradient is well calculated meaning see if it’s not nothing or zero zygote can make surprises sometimes (seems ok here)
2- add a little Dropout(0.1) in the chain after a Dense
to force the model to generalize
3- add a dense layer to make it deep

If all this doesn’t work maybe you need a Conv network instead for the problem

See a mnist exemple with Flux for example

Edit : your learning rate is 1 make it 0.01 and use Adam

jeremiedb · March 21, 2024, 10:25pm

learning_rate = 1: start with a lower learning rate, 1e-3 is a good starting point. With learning rate > 0.1, not rare to see a model fails to learn anything.
Adam may be a better default choice for optimizer
check that the data.features actually contains valid data. In the above post, it appears to be all 0s.
Perform step by step sanity check on each step of the training pipeline to validate that each component performs as intended. Notably, check that meaningfull gradients are computed (which appears to be the case according to the example below):

using Flux

mri_neural_network = Chain(
    Dense(128^2 => 32, relu),
    Dense(32 => 4, relu),
    softmax
)

x = rand(128^2, 8)
y_hat = mri_neural_network(x)
y = Flux.onehotbatch(rand(0:3, 8), 0:3)
curr_loss, gradients = Flux.withgradient(m -> Flux.crossentropy(m(x), y), mri_neural_network)

TheCedarPrince · March 22, 2024, 4:30am

Alright folks! I am back with an update – the short is I am still stuck and struggling but have observations and responses:

Alright, I checked this and can confirm that the gradients are nonzero but are close to zero. Initially, the values across my matrix are between 0 to 1 anyways so the smaller values don’t seem alarming.

I did this and happily, the loss function is now showing that a loss is being calculated.

I did check and none of my images are completely all 0 s. Curiously, I did find that for all my images, about half of all possible values in any matrix representation of an image are 0 s. I wonder if I need to crop these images further?

I had actually consulted this to get started with the project! So it is strange to me that my model is now failing after reviewing the Flux docs and Model Zoo.

I am currently a bit at a loss (pardon the pun) as to what I am doing wrong to not have my accuracies improve. Any further ideas folks have about what I could be doing wrong?

Here is the current incarnation of my code:

#=
    Convenience function to have data altogether in one place for training.
=#
function simple_loader(data; batchsize::Int=64)
    x2dim = reshape(data.features, 128^2, :)
    yhot = Flux.onehotbatch(data.targets, 1:4)
    Flux.DataLoader((x2dim, yhot); batchsize, shuffle=true)
end

#=
    Calculates accuracy for a given model
=#
function simple_accuracy(model, data)
    (x, y) = only(simple_loader(data; batchsize=length(data.targets)))
    y_hat = model(x)
    iscorrect = Flux.onecold(y_hat) .== Flux.onecold(y)
    acc = round(100 * mean(iscorrect); digits=2)
end

mri_model = Chain(
    Dense(128^2 => 32, relu),
    Dropout(0.1),
    Dense(32 => 16, relu),
    Dropout(0.1),
    Dense(16 => 4, relu),
    Dropout(0.1),
    softmax
)

learning_rate = 1e-5
optimizer = Adam

mri_optim = Flux.setup(
    optimizer(learning_rate), 
    mri_model
);

epochs = 10
losses = []
train_accs = []
test_accs = []

train_loader = simple_loader(training)
for epoch in 1:epochs
    model_loss = 0.0
    for (x, y) in train_loader
        curr_loss, gradients = Flux.withgradient(m -> Flux.crossentropy(m(x), y), mri_model)
        Flux.update!(mri_optim, mri_model, gradients[1])
        model_loss += curr_loss / length(train_loader)
    end
    train_acc = simple_accuracy(mri_model, training)
    test_acc = simple_accuracy(mri_model, testing)
    push!(losses, model_loss)
    @info "After epoch = $epoch" model_loss train_acc test_acc
end

Here’s an example output:

┌ Info: After epoch = 1
│   model_loss = 1.3881081058643758
│   train_acc = 25.0
└   test_acc = 25.0
┌ Info: After epoch = 2
│   model_loss = 1.386685368604958
│   train_acc = 25.0
└   test_acc = 25.0
┌ Info: After epoch = 3
│   model_loss = 1.3868580223061144
│   train_acc = 25.0
└   test_acc = 25.0
┌ Info: After epoch = 4
│   model_loss = 1.3867240161634982
│   train_acc = 25.0
└   test_acc = 25.0
┌ Info: After epoch = 5
│   model_loss = 1.3866696674376726
│   train_acc = 25.0
└   test_acc = 25.0
┌ Info: After epoch = 6
│   model_loss = 1.3868442443199456
│   train_acc = 25.0
└   test_acc = 25.0
┌ Info: After epoch = 7
│   model_loss = 1.386385585181415
│   train_acc = 25.0
└   test_acc = 25.0

P.S. @jeremiedb , I also took your advice of making sure I evaluated things step by step and saw the exact same as you in making sure that things made sense with each step of the pipeline. I am not seeing anything glaringly wrong…

jeremiedb · March 22, 2024, 5:03am

It’s quite possible that given the nature of the image and their associated labels, combined with an adhoc MLP model not really adapted for images, that not much learning can be happen. I’d recommend you start from an estiablished architecture, for example a ResNet: ResNet-like models · Metalhead.jl.
This will involve a different reshape to introduce to channel dimension, so a batch is of size [128, 128, 1, batchsize]

Another tricke would be to substitute your current image dataset by the MNIST ones or equivalent. If the model learns over the MNIST but not when switching to you dataset, then perhaps the issue is on that side.

Also what is the range of the values found in the image dataset? Values should be roughly normally distributed. Otherwise, prepross them or add a BatchNorm operator as an initial layer.

TheCedarPrince · March 22, 2024, 5:19am

Oh this is interesting. Indeed when I switch over to MNIST, it does learn and train but then not over the other dataset. Hm. I wonder if there is an error with the data somewhere?

Good thinking! However, I did try the BatchNorm function and no luck unfortunately. The values range between 0 to 1.

I’ll think on this some more but I think I shall also take a step back and see if there is something more fundamentally wrong with my data.

Thanks so much for the help!

yolhan_mannes · March 22, 2024, 7:39am

Maybe your datas is a lot sparser than mnist what happens if you put everything on the training set to see

TheCedarPrince · March 22, 2024, 4:13pm

I’ll give it a try to see. Curiously, MNIST has a proportion of non-zero values that is even smaller than my image data set. So I am very confused about why it just fails. I’ll report back what happens @yolhan_mannes (thank you very much for the comments so far!)

yolhan_mannes · March 22, 2024, 4:35pm

I guess you can’t make a git repo for the data it’s personal?

TheCedarPrince · March 22, 2024, 4:52pm

I actually can! I could post up the data as well as the code – would you want to check it out @yolhan_mannes?

yolhan_mannes · March 22, 2024, 5:08pm

With pleaser yes will do somewhere in the next 5h

Number is power in fine-tuning

TheCedarPrince · March 22, 2024, 5:11pm

You are awesome! I may not have time to post it today, but will ping you as soon as it is up. Thanks Yolhan!

ericphanson · March 22, 2024, 11:50pm

BTW Andrej Karpathy has a great blog post about common gotchas and tips for training neural networks; may be worth a read: A Recipe for Training Neural Networks

TheCedarPrince · March 24, 2024, 2:08pm

@yolhan_mannes, thanks for being willing to take a look at this! Here is my code with the example data I am using: Learning/Flux at main · TheCedarPrince/Learning · GitHub

The file “image_classifier.jmd” is the one with my neural network and the “alzheimers” directory is the one with the example data. I’d be curious what you think or if you can see an obvious error.

Thanks again so much!

~ tcp

yolhan_mannes · March 24, 2024, 4:23pm

Ok so I found a way to get those accuracy

4-element Vector{Float64}:
 99.30264993026499
 51.92307692307693
 99.84375
 98.66071428571429

using,

model = Flux.Chain(
    Conv((3,3),1=>4,relu,pad=1),
    MaxPool((2,2)),
    Conv((3,3),4=>8,relu,pad=1),
    MaxPool((2,2)),
    Conv((3,3),8=>16,relu,pad=1),
    MaxPool((2,2)),
    Conv((3,3),16=>32,relu,pad=1),
    MaxPool((2,2)),
    Flux.flatten,
    Dense(floor(Int,im_width/16*im_height/16*32)=>20,Flux.relu),
    Flux.Dropout(0.1),
    Dense(20=>4),
    Flux.softmax
) |> gpu
opt = Flux.setup(Adam(0.001),model)

I had to go to gpu here because it’s just a hard problem I think

yolhan_mannes · March 24, 2024, 4:30pm

Ok so, for the datas I did something weird don’t ask,

im_width = 128
im_height = 128
function get_data(width,height,type)
    folder = readdir("alzheimers/$type")
    labels = copy(folder)
    D = Dict()
    for label in labels
        fold = joinpath("alzheimers/$type",label)
        files = readdir(fold)
        D[label] = zeros(Float32,width,height,1,length(files))
        for (i,file) in enumerate(files)
            im = imresize(load(joinpath(fold,file)),(width,height))
            D[label][:,:,1,i] .= channelview(im)
        end
    end    
    labels,D
end
class,D = get_data(im_width,im_height,"train")
total_im = mapreduce(v -> size(v,4) ,+,values(D))
datas = zeros(Float32,im_width,im_height,1,total_im);
labels = Vector{String}(undef,total_im)
c = 1
for i in eachindex(class)
    Di = D[class[i]]
    for j in axes(Di,4)
        labels[c] = class[i]
        datas[:,:,:,c] .= Di[:,:,:,j] 
        c+=1
    end
end
labels
datas = gpu(datas)
label_for_flux = Flux.onehotbatch(labels,class) |> gpu
datas_for_flux = Flux.DataLoader((datas,label_for_flux),batchsize=32,shuffle=true)

For the model,

model = Flux.Chain(
    Conv((3,3),1=>4,relu,pad=1),
    MaxPool((2,2)),
    Conv((3,3),4=>8,relu,pad=1),
    MaxPool((2,2)),
    Conv((3,3),8=>16,relu,pad=1),
    MaxPool((2,2)),
    Conv((3,3),16=>32,relu,pad=1),
    MaxPool((2,2)),
    Flux.flatten,
    Dense(floor(Int,im_width/16*im_height/16*32)=>20,Flux.relu),
    Flux.Dropout(0.1),
    Dense(20=>4),
    Flux.softmax
) |> gpu

opt = Flux.setup(Adam(0.001),model)
loss(m,x,y) = Flux.crossentropy(m(x),y)

for i in 1:100
    CUDA.synchronize()
    Flux.train!(loss,model,datas_for_flux,opt)
    @info i,loss(model,datas,label_for_flux)
end

And to tests,

class,D = get_data(im_width,im_height,"test")
Flux.testmode!(model)
purc = Float64[]
for i in 1:4
    Di = D[class[i]]
    lab_re = [ class[i] for _ in axes(Di,4)]
    lab_pred = model(gpu(Di)) |> cpu |> x-> Flux.onecold(x,class)
    push!(purc,(lab_re .== lab_pred ) |> sum |> x-> x/length(lab_re)*100)
end
purc

you will need

using Flux,Images,CUDA,cuDNN

TheCedarPrince · March 24, 2024, 4:43pm

HUNH! First of all, thank you so so so much for taking a look at this @yolhan_mannes – I deeply appreciate it and this write-up is just spectacular!

This is so intriguing to me! Since I am still quite new to Flux and working with neural networks, I had a few follow-up questions:

For the model, it seems we had to instead use a convolutional neural network – is that right with how I am interpreting it here?
Why do we need pad with each Conv call? I read the documentation on Conv but why we need pad is not clear to me.
Why are we using the MaxPool layer and what is it doing? I read the documentation but am unclear why this is important in the NN.
I know you said not to ask about the data loading, but what did you change? Did you change how it was read so as to make it more GPU friendly? Or why did you rewrite it this way?

I’ll see if I can get this working on my computer – I sadly do not have a GPU so this might be a bit tricky. Again, thanks for all the help!!!

yolhan_mannes · March 24, 2024, 4:52pm

Ok so first, yes, I will try with some Dense layers with good layernorm here and there but I’m a bit pesimist here.

A pad (padding) will just add zeros at the row or column: pad= (1,0) with add 1 collumn at the left and one at the right of the matrix with zeros and pad=1 = (1,1) so two more rows and two more collumns then I use a Conv layer with a 3x3 filter meaning the matrix will not change size thanks to the padding but still will do some things to it.

MaxPool((2,2)) will just take the max of every 2x2 submatrix (with a filter of course) going from up left to right ect, meaning it will divide the height by 2 and the width by 2.

No I just prefered coding from scratch myself I would have lost more time actually reading your code and the way I get the data is pretty bad

TheCedarPrince · March 24, 2024, 5:01pm

Ahhh that makes complete sense – can’t believe I failed to recognize that.

Hunh! Wasn’t familiar with this approach yet!

Hahaha, that is entirely fair!

I think this has completely remedied my problem for now! Thank you so much for all the help! I’ll mark the post as resolved.

Final question but out of curiosity, where did you learn the intuition for building a neural network like this? If there are any papers or places you’d suggest looking, I’d be happy to do so.

Cheers Yolhan!

yolhan_mannes · March 24, 2024, 5:04pm

I like to look videos mainly for this really YT is the friend. There is Flux modelzoo github if you want the best way to write basic ones. GitHub - FluxML/model-zoo: Please do not feed the models

Topic		Replies	Views
Accuracy issues on Flux Performance question , flux	25	1261	January 3, 2023
Article that explains using Flux on MNIST from the model-zoo example Machine Learning	4	3265	October 27, 2018
Can't replicate neural network from Python's sklearn using Flux.jl General Usage flux , machine-learning , neural-network	11	1375	June 20, 2021
The same network performs differently in Flux.jl and tensorflow Machine Learning performance	13	3096	December 18, 2019
Generic Function to train NN w/ Flux Machine Learning flux	7	1656	April 14, 2020

NN Getting Same Training and Testing Accuracies?

Related topics