Reseting Device

jonathan-laurent · July 5, 2021, 6:42pm

I just want to mention that these types of issues have been a serious headache for AlphaZero.jl from the very start. To be fair, the situation is much better now that is used to be: I remember a time under CuArrays 1.2 where 90% of training time was spent in GC! But I am still probably leaving a 2x performances factor on the table, just because of bad memory management (AlphaZero.jl may be hit even harder that @Fabrice_Rosay’s implementation as it performs more allocations for the sake of modularity).

As was discussed in this thread, this is one of the rare places where Python’s ref-counting strategy is actually a great win. And I have never been completely satisfied by the answers provided in the thread I cited, which basically come down to “this is not a big deal in Julia as Julia makes it pretty easy not to allocate when necessary”.

I am still wondering how much of this problem could be solved simply with a better runtime and how much will always come down to having developers eliminate allocations in their code and free resources manually when needed. In the latter case, having powerful tooling to identify memory management issues and fix them strikes me as particularly important.

Topic		Replies	Views
Restarting CUDA GPU	1	96	November 4, 2024
Significant CUDA.jl memory allocations outside of main pool? GPU memory	2	1410	August 6, 2022
Memory is not freed with CUDA and two REPLs GPU cuda	8	1519	May 7, 2021
Freeing memory in the GPU with CUDAdrv / CUDAnative / CuArrays GPU	8	3048	November 13, 2018
Is there a way to explicitly free GPU memory? GPU	3	2620	December 15, 2019

Reseting Device

Related topics