Freeing memory in the GPU with CUDAdrv / CUDAnative / CuArrays


I am writing some code that calls CUDA kernels via CUDAdrv, allocates some CuArrays and uses a generic matrix addition (which I think is done via CUDAnative).

The problem I have is that after I call this code a couple of times, my GPU runs out of memory, as it seems that calling gc() does not free memory in the GPU.

What is the correct way to free memory in the GPU?


GPU memory is managed through the GC, although indirectly: when CuArray instances go out of scope and they are collected by the Julia GC, the GPU memory refcount is lowered and freed if it drops to 0. So make sure your arrays are out of scope before calling gc(), and make sure no other objects share the memory (eg. through a view). You can enable debug messagesthat print during finalization using JULIA_DEBUG=CUDAdrv on 0.7, and TRACE=1 with --compile-cache=no on 0.6.

Alternatively, you can force early collection by calling finalize on an array. IIRC this is a pretty slow call though, and we should probably add a different early-freeing mechanic. It also won’t do anything if the buffer’s refcount hasn’t dropped to 0.

EDIT: of course, I have assumed you’re talking about CUDAdrv’s CuArray. If you’re talking about CuArrays.jl, there’s an additional level of memory pooling. It should try and free by calling gc() once it encounters an out-of-memory error during allocation, and other than that the same rules from above (objects should be out of scope, refcounting) apply.