My name is José Pereira, I’m a portuguese PhD student currently using Julia to develop a light-weight protein design package. I’m trying to implement TorchANI, a known ML model for molecular energy calculation, developed in Python using the Torch package. My first attempt was to use the PyCall package to call the Python code directly, something like:
using PyCall @pyimport torch @pyimport torchani device = torch.device("cuda") model = torchani.models.ANI2x(periodic_table_index = true).to(device) model(...)
However, after a few hundred calls to the “model”, memory allocation causes a “CUDA out of memory” error. This has been previously observed by others. The problem seems to be related to the garbage collection mechanism, as calling
GC.gc(false) seems to help. I’ve used multiple profiling tools and verified that certain Python lines of code are allocating memory, such as:
p12_all = torch.triu_indices(num_atoms, num_atoms, 1, device=current_device)
This allocation, when running the same code in Python (not via PyCall in Julia) get’s re-used on the next iteraction/call. However, in Julia, it is being allocated every step of a loop until an explicit call to
GC.gc(false) is performed (which, of course, leads to extremely low performance). At this point the allocated memory in the GPU, as we know, does not get freed, it remains allocated and is eventually re-used.
Therefore, my question is: is this behavior expected, or implemented by default? Is there anything I can do to re-use the memory allocated by Python?