GC hitting hard

Well I’m a bit ashamed, but I almost solved the problem.
CUDA and Flux are not guilty.
When doing the calculation on GPU for AlphaGPU you initialize huge buffers on Device. Well it happens I thought they were on device… I did all initialization in the following manner, for example:

features=CuArray(zeros(Float32,(81,32*1024))

This had the side effect to allocate also huge and useless buffers on CPU, so that GC had to deal with them. As lauching CUDA kernel or using FLUX on GPU, or simply because on device buffers are dealt with GC, GC was triggered randomly in the supposedly only gpu part of the program, facing huge amounts of Array to free, since incurring a lot of pressure.
I changed the initialization using the specific CUDA commands:

features=CUDA.zeros(Float32,(81,32*1024))

This was sufficient to suppress almost all GC pressure leading to more stable and faster iteration.
On the dark side shame, on the bright side, now it can “solve” 4 in a row in around 25 minutes.
Sorry for the misleading first post.
“Ce qui ne nous tue pas nous rend plus fort”

PS: @dhairyagandhi96,

(a::Dense)(x) = a.\sigma<TAB>(a.W * x .+ a.b)

seems slightly faster and allocate less.

3 Likes