`CUDA error: out of memory` with Flux

Well… the code is now running with batches of size 10 rather than 1000.
I vaguely remember to have tried even with a batch size of 1 and to be getting the same error, but I didn’t save the example so I will remain wondering…
Btw, I was checking the GPU memory usage with nvidia-smi and, independently of the batch size, the code leaves only few MB of ram available when running the main loop, which I copy here for easy reference:

for epoch = 1:epochs
        @info "epoch" epoch
        for i in 1:batchnum
                batch = trainset[i] |> gpu
                gs = gradient(params(m)) do
                        l = loss(batch...)
                end
                @info "batch fraction" i/batchnum
                update!(opt, params(m), gs)
        end
        @show accuracy(valX, valY)
end

The error is being raised then trying to run accuracy.
I’m now doing a binary search to find out the critical batch size, for what is worth.