Using GPU via PyCall causes non-reusable memory allocation

My name is José Pereira, I’m a portuguese PhD student currently using Julia to develop a light-weight protein design package. I’m trying to implement TorchANI, a known ML model for molecular energy calculation, developed in Python using the Torch package. My first attempt was to use the PyCall package to call the Python code directly, something like:

using PyCall
@pyimport torch
@pyimport torchani

device = torch.device("cuda")
model = torchani.models.ANI2x(periodic_table_index = true).to(device)
model(...)

However, after a few hundred calls to the “model”, memory allocation causes a “CUDA out of memory” error. This has been previously observed by others. The problem seems to be related to the garbage collection mechanism, as calling GC.gc(false) seems to help. I’ve used multiple profiling tools and verified that certain Python lines of code are allocating memory, such as:

p12_all = torch.triu_indices(num_atoms, num_atoms, 1, device=current_device)

This allocation, when running the same code in Python (not via PyCall in Julia) get’s re-used on the next iteraction/call. However, in Julia, it is being allocated every step of a loop until an explicit call to GC.gc(false) is performed (which, of course, leads to extremely low performance). At this point the allocated memory in the GPU, as we know, does not get freed, it remains allocated and is eventually re-used.

Therefore, my question is: is this behavior expected, or implemented by default? Is there anything I can do to re-use the memory allocated by Python?

Thank you :smile:

1 Like

Had a similar problem. The issue you have is about the GPU cache on the PyTorch side. Try

torch.cuda.empty_cache()

after model call

See How can we release GPU memory cache? - PyTorch Forums

1 Like

Thanks for your answer. I tried to simply add torch.cuda.empty_cache() after each model call, but it didn’t change a thing:

Captura de ecrã 2021-02-16 153806

this is the memory allocation after just a few iterations. The memory keeps leaking.

It sounds like Julia’s garbage collection just isn’t running frequently enough for you, probably because Julia doesn’t know that memory is running low on the CUDA side?

You can explicitly tell Python you are done with an object o from PyCall by calling pydecref(o). (This is safe if you are done with the object: it gets mutated to a NULL object to prevent it from being decref’ed again. Perhaps the function should have been called pydecref!…) Equivalently, you can just call finalize(o).

See also stop using finalizers for resource management? · Issue #11207 · JuliaLang/julia · GitHub and `with` for deterministic destruction · Issue #7721 · JuliaLang/julia · GitHub for discussion of this general issue for resource management in Julia.

I will take a look at these links and ideas and report back.
So far my solution has been to:

  1. Measure current GPU allocation every step
  2. Once current GPU allocation surpasses my established threshold (say, 50% of total available GPU memory), set the number of elapsed steps as the maximum allowed number of calls before a GC.gc(false) is called (N).
  3. Call GC.gc(false) every N steps.