Well… the code is now running with batches of size 10 rather than 1000.
I vaguely remember to have tried even with a batch size of 1 and to be getting the same error, but I didn’t save the example so I will remain wondering…
Btw, I was checking the GPU memory usage with nvidia-smi
and, independently of the batch size, the code leaves only few MB of ram available when running the main loop, which I copy here for easy reference:
for epoch = 1:epochs
@info "epoch" epoch
for i in 1:batchnum
batch = trainset[i] |> gpu
gs = gradient(params(m)) do
l = loss(batch...)
end
@info "batch fraction" i/batchnum
update!(opt, params(m), gs)
end
@show accuracy(valX, valY)
end
The error is being raised then trying to run accuracy
.
I’m now doing a binary search to find out the critical batch size, for what is worth.