Using GPU via PyCall causes non-reusable memory allocation

JosePereiraUA · February 12, 2021, 11:47am

My name is José Pereira, I’m a portuguese PhD student currently using Julia to develop a light-weight protein design package. I’m trying to implement TorchANI, a known ML model for molecular energy calculation, developed in Python using the Torch package. My first attempt was to use the PyCall package to call the Python code directly, something like:

using PyCall
@pyimport torch
@pyimport torchani

device = torch.device("cuda")
model = torchani.models.ANI2x(periodic_table_index = true).to(device)
model(...)

However, after a few hundred calls to the “model”, memory allocation causes a “CUDA out of memory” error. This has been previously observed by others. The problem seems to be related to the garbage collection mechanism, as calling GC.gc(false) seems to help. I’ve used multiple profiling tools and verified that certain Python lines of code are allocating memory, such as:

p12_all = torch.triu_indices(num_atoms, num_atoms, 1, device=current_device)

This allocation, when running the same code in Python (not via PyCall in Julia) get’s re-used on the next iteraction/call. However, in Julia, it is being allocated every step of a loop until an explicit call to GC.gc(false) is performed (which, of course, leads to extremely low performance). At this point the allocated memory in the GPU, as we know, does not get freed, it remains allocated and is eventually re-used.

Therefore, my question is: is this behavior expected, or implemented by default? Is there anything I can do to re-use the memory allocated by Python?

Thank you

groot · February 13, 2021, 10:26pm

Had a similar problem. The issue you have is about the GPU cache on the PyTorch side. Try

torch.cuda.empty_cache()

after model call

See How can we release GPU memory cache? - PyTorch Forums

JosePereiraUA · February 16, 2021, 3:39pm

Thanks for your answer. I tried to simply add torch.cuda.empty_cache() after each model call, but it didn’t change a thing:

Captura de ecrã 2021-02-16 153806

this is the memory allocation after just a few iterations. The memory keeps leaking.

stevengj · February 16, 2021, 4:07pm

It sounds like Julia’s garbage collection just isn’t running frequently enough for you, probably because Julia doesn’t know that memory is running low on the CUDA side?

You can explicitly tell Python you are done with an object o from PyCall by calling pydecref(o). (This is safe if you are done with the object: it gets mutated to a NULL object to prevent it from being decref’ed again. Perhaps the function should have been called pydecref!…) Equivalently, you can just call finalize(o).

See also stop using finalizers for resource management? · Issue #11207 · JuliaLang/julia · GitHub and `with` for deterministic destruction · Issue #7721 · JuliaLang/julia · GitHub for discussion of this general issue for resource management in Julia.

JosePereiraUA · February 16, 2021, 6:33pm

I will take a look at these links and ideas and report back.
So far my solution has been to:

Measure current GPU allocation every step
Once current GPU allocation surpasses my established threshold (say, 50% of total available GPU memory), set the number of elapsed steps as the maximum allowed number of calls before a GC.gc(false) is called (N).
Call GC.gc(false) every N steps.

Topic		Replies	Views
When calling PyTorch using PyCall or pythoncall, I run out of memory on GPU cards Machine Learning	1	244	July 31, 2023
Unreasonable GPU memory usage with PyCall General Usage	3	665	November 16, 2020
Memory is not freed with CUDA and two REPLs GPU cuda	8	1519	May 7, 2021
GPU code has a high amount of CPU allocations? GPU	7	520	February 8, 2023
Why is it consuming and not freeing GPU memory? GPU	5	465	April 18, 2024

Using GPU via PyCall causes non-reusable memory allocation

Related topics