Hello,
I have read that there is no garbage collection in GPUs. I have an algorithm where I use an Array{Array{CuArrays{Float32,1},1}}
as a buffer. From that I sample a batch that I transform in a NTuple{CuArrays, N}
to feed a training loop in Flux with. The buffer has a fixed size and I regularly generate new elements (Array{CuArrays{Float32,1},1}
) to replace the old ones. I think that when I do that I replace pointers to CuArrays with other ones without freeing the Vmemory. I know that a solution is to bring the CuArray back to the cpu and then erase it but I think that’s highly inefficient and very slow. My first question is: is there a way to free that memory without the costly transfer to cpu ?
Here’s an example:
buffer = Array{Array{CuArrays{Float32,1},1}}()
... populate the buffer to its fixed size.
x::Array{CuArray{Float32,1},1}()
y::Array{CuArray{Float32,1},1}()#(say that N = 2)
batch = rand(buffer, 20)
for element in batch
push!(x, element[1])
push!(y, element[2])
end
x = Flux.batch(x) # produces a CuArray{Float32,2}
y = Flux.batch(y) # produces a CuArray{Float32,2}
data = (x,y)
... train a network on data
newElement::Array{CuArrays{Float32,1},1} = generateanewelement()
push!(buffer, newElement)
popfirst!(buffer) #the popped element is an array of pointers, the VRAM is not freed
I though that I could simply overwrite the old element with the new at the same location (which would be the most efficient way to go). Say I do that this way:
function overwriteoldelement(buffer, indexofoldest)
buffer[indexofoldest][1] = generatenewX()
buffer[indexofoldest][2] = generatenewY() #these two output CuArrays
end
I don’t think this overwrites the memory, it simply changes the pointer or something like that right ?