CUDA memory isn't freed and cannot be backtracked

This is a perfectly normal report: you’re only using 5MB of GPU memory, while the underlying pool (which allocations are made in) is currently sized around 5GB, thus consuming most of the physical memory on your device. This does not mean that the memory is unavailable, you can allocate 5GB-5MB. So this isn’t indicative of an OOM, or a memory leak.