Running OOM trying to load data to GPU

Hello there.
After training the CNN, I wrote a function to estimate the accuracy of it.

function accuracy(A, B, name)
    BSON.@load name * ".bson" model
    model = model |> gpu
    X1, Y1 = A
    X2, Y2 = B
    Y_tr_r = model(X1 |> gpu) |> cpu
    Y_te_r = model(X2 |> gpu) |> cpu
    a = mean(isapprox.(Y_tr_r, Y1; atol=0.015)) * 100
    b = mean(isapprox.(Y_te_r, Y2; atol=0.015)) * 100
    return a, b
end

Problem is: the GPU runs OOM when trying to load the data to it. I do not understand why, the data is not big in size:
X1 is Array{Float64,4} dims=(128,128,1,5000), X2 is similarly Array{Float64,4} dims=(128,128,1,1250), Y1 and Y2 are even smaller.
The exact line it errors is Y_tr_r = model(X1 |> gpu) |> cpu
GPU: Nvidia GeForce GTX 1660 SUPER (6 GB VRAM).
Yes, I CUDA.reclaim() finishing training, so the VRA;M is mostly free…
Using the CPU works, but I would like to know if using the GPU is faster to evaluate.

What is model? If your model is large enough, it’s possible the allocations from the forward pass would be enough to OOM. Even for something the size of a Resnet-18, 128^2 with a batch size of 5000 could OOM a 8GB GPU, let alone a 6GB one (consider that batch sizes are usually < 512)!

My apologies for not answering earlier. I do believe that was the problem, I was abusing mu GPU with ridiculous amounts of neurons after the Conv layers…