# NN Getting Same Training and Testing Accuracies?

Hi folks,

Background: I am having a bit of trouble training a deep neural network. My data is comprised of grayscale images with a dimension of 128x128. For training data, I have 128x128x5116 images and for testing data I have 128x128x20484 images.

Goal: With each of these images, I am supposed to predict a label from amongst four potential labels (i.e. 1, 2, 3, 4).

Problem: I am trying to tinker with my learning rates and optimizing the neural network I am building but seem to continue getting the exact same accuracies for both my training and test sets. The loss seems to be changing ever so slightly but not hugely. I feel like I am doing something wrong with my optimization steps. Could someone take a look at my code and see if I am doing something â€śwrongâ€ť?

Edit 1: upon further scrutiny, I feel it has something to do with how I am computing accuracy. Am I doing something wrong there specifically?

Edit 2: nevermind, I think that is working correctly so I am unsure why I always get an accuracy of 25% for both testing and training.

Code:

Defining my small neural network:

``````mri_neural_network = Chain(
Dense(128^2 => 32, relu),
Dense(32 => 4, relu),
softmax
)
``````

Then I define some helper functions:

``````"""
Convenience function to have data altogether in one place for training.
"""
x2dim = reshape(data.features, 128^2, :)
yhot = Flux.onehotbatch(data.targets, 0:3)
Flux.DataLoader((x2dim, yhot); batchsize, shuffle=true)
end

"""
Calculates accuracy for a given model
"""
function simple_accuracy(model, data)
(x, y) = only(simple_loader(data; batchsize=length(data.targets)))
y_hat = model(x)
iscorrect = Flux.onecold(y_hat) .== Flux.onecold(y)
acc = round(100 * mean(iscorrect); digits=2)
end
``````

Then I define the descent optimizer I want to use during training optimization:

``````learning_rate = 1
optimizer = Descent

mri_optim = Flux.setup(
optimizer(learning_rate),
mri_neural_network
);
``````

Let the model train:

``````epochs = 20
losses = []
train_accs = []
test_accs = []

for epoch in 1:epochs
model_loss = 0.0
for (x, y) in train_loader
curr_loss, gradients = Flux.withgradient(m -> Flux.crossentropy(m(x), y), mri_neural_network)
model_loss += curr_loss / length(train_loader)
end
train_acc = simple_accuracy(mri_neural_network, training)
test_acc = simple_accuracy(mri_neural_network, testing)
push!(losses, model_loss)
@info "After epoch = \$epoch" model_loss train_acc test_acc
end
``````

Output Example: Hereâ€™s an example output of what I am seeing:

``````â”Ś Info: After epoch = 1
â”‚   model_loss = 1.3862956166267395
â”‚   train_acc = 25.0
â””   test_acc = 25.0
â”Ś Info: After epoch = 2
â”‚   model_loss = 1.3862956166267395
â”‚   train_acc = 25.0
â””   test_acc = 25.0
â”Ś Info: After epoch = 3
â”‚   model_loss = 1.3862956166267395
â”‚   train_acc = 25.0
â””   test_acc = 25.0
``````

Additional Notes: Just to have as an example, here is what the training data (`data.features` and `data.targets`) looks like in the code:

``````julia> training.features
128Ă—128Ă—20484 Array{Float32, 3}:
[:, :, 1] =
0.0  0.0  0.0  0.0  0.0  0.0  â€¦  0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
â‹®                        â‹®    â‹±                 â‹®
0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0  â€¦  0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0

julia> training.targets
20484-element Vector{Int32}:
0
0
0
0
0
â‹®
3
3
3
3
3
``````

Any ideas about what I could be doing wrong? Any more information I could provide?

Cheers!

~ tcp

2 Likes

Your loss isnâ€™t moving so I donâ€™t think the accuracy is the issue,
1 - ensure the gradient is well calculated meaning see if itâ€™s not nothing or zero zygote can make surprises sometimes (seems ok here)
2- add a little Dropout(0.1) in the chain after a Dense
to force the model to generalize
3- add a dense layer to make it deep

If all this doesnâ€™t work maybe you need a Conv network instead for the problem

See a mnist exemple with Flux for example

Edit : your learning rate is 1 make it 0.01 and use Adam

3 Likes
• `learning_rate = 1`: start with a lower learning rate, 1e-3 is a good starting point. With learning rate > 0.1, not rare to see a model fails to learn anything.
• `Adam` may be a better default choice for optimizer
• check that the `data.features` actually contains valid data. In the above post, it appears to be all 0s.
• Perform step by step sanity check on each step of the training pipeline to validate that each component performs as intended. Notably, check that meaningfull gradients are computed (which appears to be the case according to the example below):
``````using Flux

mri_neural_network = Chain(
Dense(128^2 => 32, relu),
Dense(32 => 4, relu),
softmax
)

x = rand(128^2, 8)
y_hat = mri_neural_network(x)
y = Flux.onehotbatch(rand(0:3, 8), 0:3)
curr_loss, gradients = Flux.withgradient(m -> Flux.crossentropy(m(x), y), mri_neural_network)
``````
5 Likes

Alright folks! I am back with an update â€“ the short is I am still stuck and struggling but have observations and responses:

Alright, I checked this and can confirm that the gradients are nonzero but are close to zero. Initially, the values across my matrix are between 0 to 1 anyways so the smaller values donâ€™t seem alarming.

I did this and happily, the loss function is now showing that a loss is being calculated.

I did check and none of my images are completely all 0 s. Curiously, I did find that for all my images, about half of all possible values in any matrix representation of an image are 0 s. I wonder if I need to crop these images further?

I had actually consulted this to get started with the project! So it is strange to me that my model is now failing after reviewing the Flux docs and Model Zoo.

I am currently a bit at a loss (pardon the pun) as to what I am doing wrong to not have my accuracies improve. Any further ideas folks have about what I could be doing wrong?

Here is the current incarnation of my code:

``````#=
Convenience function to have data altogether in one place for training.
=#
x2dim = reshape(data.features, 128^2, :)
yhot = Flux.onehotbatch(data.targets, 1:4)
Flux.DataLoader((x2dim, yhot); batchsize, shuffle=true)
end

#=
Calculates accuracy for a given model
=#
function simple_accuracy(model, data)
(x, y) = only(simple_loader(data; batchsize=length(data.targets)))
y_hat = model(x)
iscorrect = Flux.onecold(y_hat) .== Flux.onecold(y)
acc = round(100 * mean(iscorrect); digits=2)
end

mri_model = Chain(
Dense(128^2 => 32, relu),
Dropout(0.1),
Dense(32 => 16, relu),
Dropout(0.1),
Dense(16 => 4, relu),
Dropout(0.1),
softmax
)

learning_rate = 1e-5

mri_optim = Flux.setup(
optimizer(learning_rate),
mri_model
);

epochs = 10
losses = []
train_accs = []
test_accs = []

for epoch in 1:epochs
model_loss = 0.0
for (x, y) in train_loader
curr_loss, gradients = Flux.withgradient(m -> Flux.crossentropy(m(x), y), mri_model)
model_loss += curr_loss / length(train_loader)
end
train_acc = simple_accuracy(mri_model, training)
test_acc = simple_accuracy(mri_model, testing)
push!(losses, model_loss)
@info "After epoch = \$epoch" model_loss train_acc test_acc
end
``````

Hereâ€™s an example output:

``````â”Ś Info: After epoch = 1
â”‚   model_loss = 1.3881081058643758
â”‚   train_acc = 25.0
â””   test_acc = 25.0
â”Ś Info: After epoch = 2
â”‚   model_loss = 1.386685368604958
â”‚   train_acc = 25.0
â””   test_acc = 25.0
â”Ś Info: After epoch = 3
â”‚   model_loss = 1.3868580223061144
â”‚   train_acc = 25.0
â””   test_acc = 25.0
â”Ś Info: After epoch = 4
â”‚   model_loss = 1.3867240161634982
â”‚   train_acc = 25.0
â””   test_acc = 25.0
â”Ś Info: After epoch = 5
â”‚   model_loss = 1.3866696674376726
â”‚   train_acc = 25.0
â””   test_acc = 25.0
â”Ś Info: After epoch = 6
â”‚   model_loss = 1.3868442443199456
â”‚   train_acc = 25.0
â””   test_acc = 25.0
â”Ś Info: After epoch = 7
â”‚   model_loss = 1.386385585181415
â”‚   train_acc = 25.0
â””   test_acc = 25.0

``````

P.S. @jeremiedb , I also took your advice of making sure I evaluated things step by step and saw the exact same as you in making sure that things made sense with each step of the pipeline. I am not seeing anything glaringly wrongâ€¦

1 Like

Itâ€™s quite possible that given the nature of the image and their associated labels, combined with an adhoc MLP model not really adapted for images, that not much learning can be happen. Iâ€™d recommend you start from an estiablished architecture, for example a ResNet: ResNet-like models Â· Metalhead.jl.
This will involve a different reshape to introduce to channel dimension, so a batch is of size `[128, 128, 1, batchsize]`

Another tricke would be to substitute your current image dataset by the MNIST ones or equivalent. If the model learns over the MNIST but not when switching to you dataset, then perhaps the issue is on that side.

Also what is the range of the values found in the image dataset? Values should be roughly normally distributed. Otherwise, prepross them or add a `BatchNorm` operator as an initial layer.

2 Likes

Oh this is interesting. Indeed when I switch over to MNIST, it does learn and train but then not over the other dataset. Hm. I wonder if there is an error with the data somewhere?

Good thinking! However, I did try the `BatchNorm` function and no luck unfortunately. The values range between 0 to 1.

Iâ€™ll think on this some more but I think I shall also take a step back and see if there is something more fundamentally wrong with my data.

Thanks so much for the help!

1 Like

Maybe your datas is a lot sparser than mnist what happens if you put everything on the training set to see

1 Like

Iâ€™ll give it a try to see. Curiously, MNIST has a proportion of non-zero values that is even smaller than my image data set. So I am very confused about why it just fails. Iâ€™ll report back what happens @yolhan_mannes (thank you very much for the comments so far!)

1 Like

I guess you canâ€™t make a git repo for the data itâ€™s personal?

1 Like

I actually can! I could post up the data as well as the code â€“ would you want to check it out @yolhan_mannes?

With pleaser yes will do somewhere in the next 5h

Number is power in fine-tuning

1 Like

You are awesome! I may not have time to post it today, but will ping you as soon as it is up. Thanks Yolhan!

1 Like

BTW Andrej Karpathy has a great blog post about common gotchas and tips for training neural networks; may be worth a read: A Recipe for Training Neural Networks

1 Like

@yolhan_mannes, thanks for being willing to take a look at this! Here is my code with the example data I am using: Learning/Flux at main Â· TheCedarPrince/Learning Â· GitHub

The file â€śimage_classifier.jmdâ€ť is the one with my neural network and the â€śalzheimersâ€ť directory is the one with the example data. Iâ€™d be curious what you think or if you can see an obvious error.

Thanks again so much!

~ tcp

Ok so I found a way to get those accuracy

``````4-element Vector{Float64}:
99.30264993026499
51.92307692307693
99.84375
98.66071428571429
``````

using,

``````model = Flux.Chain(
MaxPool((2,2)),
MaxPool((2,2)),
MaxPool((2,2)),
MaxPool((2,2)),
Flux.flatten,
Dense(floor(Int,im_width/16*im_height/16*32)=>20,Flux.relu),
Flux.Dropout(0.1),
Dense(20=>4),
Flux.softmax
) |> gpu
``````

I had to go to gpu here because itâ€™s just a hard problem I think

1 Like

Ok so, for the datas I did something weird donâ€™t ask,

``````im_width = 128
im_height = 128
function get_data(width,height,type)
labels = copy(folder)
D = Dict()
for label in labels
fold = joinpath("alzheimers/\$type",label)
D[label] = zeros(Float32,width,height,1,length(files))
for (i,file) in enumerate(files)
D[label][:,:,1,i] .= channelview(im)
end
end
labels,D
end
class,D = get_data(im_width,im_height,"train")
total_im = mapreduce(v -> size(v,4) ,+,values(D))
datas = zeros(Float32,im_width,im_height,1,total_im);
labels = Vector{String}(undef,total_im)
c = 1
for i in eachindex(class)
Di = D[class[i]]
for j in axes(Di,4)
labels[c] = class[i]
datas[:,:,:,c] .= Di[:,:,:,j]
c+=1
end
end
labels
datas = gpu(datas)
label_for_flux = Flux.onehotbatch(labels,class) |> gpu
``````

For the model,

``````model = Flux.Chain(
MaxPool((2,2)),
MaxPool((2,2)),
MaxPool((2,2)),
MaxPool((2,2)),
Flux.flatten,
Dense(floor(Int,im_width/16*im_height/16*32)=>20,Flux.relu),
Flux.Dropout(0.1),
Dense(20=>4),
Flux.softmax
) |> gpu

loss(m,x,y) = Flux.crossentropy(m(x),y)

for i in 1:100
CUDA.synchronize()
Flux.train!(loss,model,datas_for_flux,opt)
@info i,loss(model,datas,label_for_flux)
end
``````

And to tests,

``````class,D = get_data(im_width,im_height,"test")
Flux.testmode!(model)
purc = Float64[]
for i in 1:4
Di = D[class[i]]
lab_re = [ class[i] for _ in axes(Di,4)]
lab_pred = model(gpu(Di)) |> cpu |> x-> Flux.onecold(x,class)
push!(purc,(lab_re .== lab_pred ) |> sum |> x-> x/length(lab_re)*100)
end
purc
``````

you will need

``````using Flux,Images,CUDA,cuDNN
``````
1 Like

HUNH! First of all, thank you so so so much for taking a look at this @yolhan_mannes â€“ I deeply appreciate it and this write-up is just spectacular!

This is so intriguing to me! Since I am still quite new to Flux and working with neural networks, I had a few follow-up questions:

1. For the model, it seems we had to instead use a convolutional neural network â€“ is that right with how I am interpreting it here?
2. Why do we need pad with each `Conv` call? I read the documentation on `Conv` but why we need pad is not clear to me.
3. Why are we using the `MaxPool` layer and what is it doing? I read the documentation but am unclear why this is important in the NN.
4. I know you said not to ask about the data loading, but what did you change? Did you change how it was read so as to make it more GPU friendly? Or why did you rewrite it this way?

Iâ€™ll see if I can get this working on my computer â€“ I sadly do not have a GPU so this might be a bit tricky. Again, thanks for all the help!!!

Ok so first, yes, I will try with some Dense layers with good layernorm here and there but Iâ€™m a bit pesimist here.

A pad (padding) will just add zeros at the row or column: pad= (1,0) with add 1 collumn at the left and one at the right of the matrix with zeros and pad=1 = (1,1) so two more rows and two more collumns then I use a Conv layer with a 3x3 filter meaning the matrix will not change size thanks to the padding but still will do some things to it.

MaxPool((2,2)) will just take the max of every 2x2 submatrix (with a filter of course) going from up left to right ect, meaning it will divide the height by 2 and the width by 2.

No I just prefered coding from scratch myself I would have lost more time actually reading your code and the way I get the data is pretty bad

1 Like

Ahhh that makes complete sense â€“ canâ€™t believe I failed to recognize that.

Hunh! Wasnâ€™t familiar with this approach yet!

Hahaha, that is entirely fair!

I think this has completely remedied my problem for now! Thank you so much for all the help! Iâ€™ll mark the post as resolved.

Final question but out of curiosity, where did you learn the intuition for building a neural network like this? If there are any papers or places youâ€™d suggest looking, Iâ€™d be happy to do so.

Cheers Yolhan!

I like to look videos mainly for this really YT is the friend. There is Flux modelzoo github if you want the best way to write basic ones. GitHub - FluxML/model-zoo: Please do not feed the models

1 Like