Memory usage increasing with each epoch

Depending on what’s causing the memory growth, I don’t think it can be extrapolated in such a way. For example, this could be CUDNN cache locking prevents finalizers resulting in OOMs · Issue #1461 · JuliaGPU/CUDA.jl · GitHub resurfacing.

For next steps, I would double-check that adding a hard mem limit + GC.gc(false) every 10ish steps doesn’t help memory usage. Another thing to do would be putting the training loop in its own function (ensuring that having these global vars or the loss function closure being re-created do not cause problems). If none of the above combined makes a difference, then there’s a deeper issue.