I probably was unclear about the slow down, it is still the same problem i mentioned in GC hitting hard
Changing initialization lowered memory pressure, but the slow down keep hitting. I’m pretty sure it is related to memory problem as it strikes faster when working on bigger games. So I was thinking that maybe reseting completly the device would make the next iteration as fast as the first, and I was afraid that functions would then have to be recompiled which it is not the case.
But reseting didn’t prevent slowdown to happen and it had the effect to make CUDA create a new pool wasting a lot of time so it’s definetly not a solution.
So all I’m left with is trying to manage memory manually and see if it works.
CUDA.unsafe_free!
was not working on unmanaged arrays created with ‘unsafe_wrap’, what I try is to free the buffer with ‘CUDA.Mem.free’.
As a side note, I know disabling GC is not magical, but I know also that there is a problem when mixing GC pressure and CUDA.jl and I would like to find a solution, just to see if I can get my implementation as fast as the C++ one. (I also talked to Jonathan Laurent creator of AlphaZero.jl and he told me he also have memory issues).
Thank you for your help