Memory usage increasing with each epoch

ToucheSir · November 6, 2024, 1:18am

Depending on what’s causing the memory growth, I don’t think it can be extrapolated in such a way. For example, this could be CUDNN cache locking prevents finalizers resulting in OOMs · Issue #1461 · JuliaGPU/CUDA.jl · GitHub resurfacing.

For next steps, I would double-check that adding a hard mem limit + GC.gc(false) every 10ish steps doesn’t help memory usage. Another thing to do would be putting the training loop in its own function (ensuring that having these global vars or the loss function closure being re-created do not cause problems). If none of the above combined makes a difference, then there’s a deeper issue.

Topic		Replies	Views
Flux runs out of memory Machine Learning memory-allocation , flux	25	4585	June 1, 2023
GPU memory usage increasing on each epoch (Flux) Machine Learning cuda , flux	5	781	April 16, 2024
A implementation of ResNet-18 uses lot of GPU memory Machine Learning question , flux	10	3424	May 9, 2020
State of deep learning in Julia Machine Learning	18	15758	September 28, 2019
MNIST GPU CuArrays error GPU	23	3185	January 22, 2019

Memory usage increasing with each epoch

Related topics