Why is it consuming and not freeing GPU memory?

Hi,
I’m running into something I do not understand. My simple CUDA code is taking
memory all the time and not freeing it. A simple example:

julia> using CUDA

julia> CUDA.memory_status()
Effective GPU memory usage: 4.16% (501.188 MiB/11.759 GiB)
Memory pool usage: 0 bytes (0 bytes reserved)

julia> kk = CUDA.rand(256,256,256)
julia> aux = CUDA.rand(256,256,256)

CUDA.memory_status()
Effective GPU memory usage: 19.58% (2.302 GiB/11.759 GiB)
Memory pool usage: 128.000 MiB (128.000 MiB reserved)

julia>for i in 1:10
    aux .= CUDA.exp.(kk)
    CUDA.memory_status()
end
Effective GPU memory usage: 19.64% (2.310 GiB/11.759 GiB)
Memory pool usage: 128.000 MiB (128.000 MiB reserved)
Effective GPU memory usage: 19.64% (2.310 GiB/11.759 GiB)
Memory pool usage: 128.000 MiB (128.000 MiB reserved)
Effective GPU memory usage: 19.64% (2.310 GiB/11.759 GiB)
Memory pool usage: 128.000 MiB (128.000 MiB reserved)
Effective GPU memory usage: 19.64% (2.310 GiB/11.759 GiB)
Memory pool usage: 128.000 MiB (128.000 MiB reserved)
Effective GPU memory usage: 19.64% (2.310 GiB/11.759 GiB)
Memory pool usage: 128.000 MiB (128.000 MiB reserved)
Effective GPU memory usage: 19.64% (2.310 GiB/11.759 GiB)
Memory pool usage: 128.000 MiB (128.000 MiB reserved)
Effective GPU memory usage: 19.64% (2.310 GiB/11.759 GiB)
Memory pool usage: 128.000 MiB (128.000 MiB reserved)
Effective GPU memory usage: 19.64% (2.310 GiB/11.759 GiB)
Memory pool usage: 128.000 MiB (128.000 MiB reserved)
Effective GPU memory usage: 19.64% (2.310 GiB/11.759 GiB)
Memory pool usage: 128.000 MiB (128.000 MiB reserved)
Effective GPU memory usage: 19.64% (2.310 GiB/11.759 GiB)
Memory pool usage: 128.000 MiB (128.000 MiB reserved)

ok that’s what I would expect, no new memory is being allocated in the GPU.
But if I run

julia> for i in 1:10
    aux .= CUDA.exp.(0.01f0*kk)
    CUDA.memory_status()
end
Effective GPU memory usage: 20.26% (2.382 GiB/11.759 GiB)
Memory pool usage: 192.000 MiB (192.000 MiB reserved)
Effective GPU memory usage: 20.79% (2.445 GiB/11.759 GiB)
Memory pool usage: 256.000 MiB (256.000 MiB reserved)
Effective GPU memory usage: 21.32% (2.507 GiB/11.759 GiB)
Memory pool usage: 320.000 MiB (320.000 MiB reserved)
Effective GPU memory usage: 21.85% (2.570 GiB/11.759 GiB)
Memory pool usage: 384.000 MiB (384.000 MiB reserved)
Effective GPU memory usage: 22.39% (2.632 GiB/11.759 GiB)
Memory pool usage: 448.000 MiB (448.000 MiB reserved)
Effective GPU memory usage: 22.92% (2.695 GiB/11.759 GiB)
Memory pool usage: 512.000 MiB (512.000 MiB reserved)
Effective GPU memory usage: 23.45% (2.757 GiB/11.759 GiB)
Memory pool usage: 576.000 MiB (576.000 MiB reserved)
Effective GPU memory usage: 23.98% (2.820 GiB/11.759 GiB)
Memory pool usage: 640.000 MiB (640.000 MiB reserved)
Effective GPU memory usage: 24.51% (2.882 GiB/11.759 GiB)
Memory pool usage: 704.000 MiB (704.000 MiB reserved)
Effective GPU memory usage: 25.04% (2.945 GiB/11.759 GiB)
Memory pool usage: 768.000 MiB (768.000 MiB reserved)

it keeps consuming memory and not freeing it. And nothing seems
yto be releasing it

julia> CUDA.memory_status()
Effective GPU memory usage: 25.52% (3.000 GiB/11.759 GiB)
Memory pool usage: 768.000 MiB (768.000 MiB reserved)

julia> CUDA.reclaim()

julia> CUDA.memory_status()
Effective GPU memory usage: 25.45% (2.993 GiB/11.759 GiB)
Memory pool usage: 768.000 MiB (768.000 MiB reserved)

this is a minimal working example, but in my iteration codes that makes my linux
box hang sometimes…

What am I doing wrong?

Thanks in advance…

Did you mean to do?

julia> for i in 1:10
    aux .= CUDA.exp.(0.01f0.*kk)
    CUDA.memory_status()
end

Notice .* instead of * for broadcasting.

Judging by the docs, you can try setting JULIA_CUDA_SOFT_MEMORY_LIMIT or JULIA_CUDA_HARD_MEMORY_LIMIT environment variable (not sure, but I think it needs to be set before using CUDA). Also consider calling CUDA.reclaim(), GC.gc() or manually freeing with CUDA.unsafe_free!(a).

You should generally not use the same GPU for driving a display and doing computationally-intensive things. It’s actually the compute that will make the output “hang”, not the use of memory.

And regarding the use of memory, Julia is a garbage collected language, so memory will only be collected when it’s necessary.

Please don’t call CUDA.reclaim() or even GC.gc() unless really necessary, both operations will slow down your application significantly if misused. unsafe_free!ing unused memory can be good practice, but as the name implies it’s an unsafe operation so should be done with care.

You can also try Consider running GC when allocating and synchronizing by maleadt · Pull Request #2304 · JuliaGPU/CUDA.jl · GitHub, which will consider collecting memory at more points than only when running out of it.

Hi,
well seems that setting the soft limits of the amount of memory to use in the GPU with
JULIA_CUDA_SOFT_MEMORY_LIMIT
solves the problem, at least in the very first tests I’m conducting now, thanks.
Best,
Ferran.