Question about garbage collection, AD, and CUDA

I encountered an issue that is in this thread Allocator very slow to reclaim memory after running for sufficiently long · Issue #137 · JuliaGPU/CUDA.jl · GitHub. For a machine learning workload, I noticed that if I don’t manually disable and enable the GC in Julia while using the GPU (i.e. disable GC before using the GPU and enabling after GPU is done working), the code significantly slows down.

  1. Is this safe with regards to memory leak and segfaults? The site says that manual memory management is not needed, so I was hoping I don’t have to code at a lower level. Memory management · CUDA.jl.

  2. What libraries allow reverse automatic differentiation of a cost function involving a derivative obtained using another automatic differentiation library or finite differencing? I looked through threads and see that nested AD may not be supported right now.

It should be. You can of course run out of memory during the time when the GC was disabled, but when you re-enable it it will also collect memory that was allocated during that time.

wrt. the manual memory management, you can always help the GC a little by adding calls to CUDA.unsafe_free! where possible. This will also help if you’re already disabling the GC (because memory can be reused more quickly).

1 Like