Hello. First of all, I am using Julia v1.12.6, CUDA v5.11.2, and FLoops v0.2.2. Consider the following code and comments.
using CUDA
using Base.Threads
function func1()
A = CuArray{Float64}(undef, 1500, 1500, 1000)
for i in 1:size(A,3)
A_ = view(A, :, :, i)
end
end
function func2()
A = CuArray{Float64}(undef, 1500, 1500, 1000)
@threads for i in 1:size(A,3)
A_ = view(A, :, :, i)
end
return
end
# GPU almost empty
println("\nA")
CUDA.memory_status()
# do task without threading
func1()
# memory still allocated, but that is okay
println("\nB")
CUDA.memory_status()
# manually clean up
GC.gc()
CUDA.reclaim()
# GPU empty
println("\nC")
CUDA.memory_status()
# do task with threading threading
func2()
# memory still allocated, but that is okay
println("\nD")
CUDA.memory_status()
# manually clean up
GC.gc()
CUDA.reclaim()
# NOT DEALLOCATED!!!!! BUG?????
println("\nE")
CUDA.memory_status()
When I run this code I get the following output
A
Effective GPU memory usage: 3.39% (821.250 MiB/23.643 GiB)
Memory pool usage: 0 bytes (0 bytes reserved)
B
Effective GPU memory usage: 74.37% (17.583 GiB/23.643 GiB)
Memory pool usage: 16.764 GiB (16.781 GiB reserved)
C
Effective GPU memory usage: 3.39% (821.250 MiB/23.643 GiB)
Memory pool usage: 0 bytes (0 bytes reserved)
D
Effective GPU memory usage: 74.39% (17.587 GiB/23.643 GiB)
Memory pool usage: 16.764 GiB (16.781 GiB reserved)
E
Effective GPU memory usage: 74.39% (17.587 GiB/23.643 GiB)
Memory pool usage: 16.764 GiB (16.781 GiB reserved)
Note that memory is not freed between E and D. What is going on?
# This is the best current work-arround
function func2()
A = CuArray{Float64}(undef, 1500, 1500, 1000)
@threads for i in 1:size(A,3)
A_ = view(A, :, :, i)
A_ = nothing
end
CUDA.unsafe_free!(A)
return
end
# This also works, though I am not sure if
# in general every thread will be utilized.
function func2()
A = CuArray{Float64}(undef, 1500, 1500, 1000)
@threads for i in 1:size(A,3)
A_ = view(A, :, :, i)
A_ = nothing
end
@threads for _ in 1:nthreads()
println(threadid())
GC.gc()
CUDA.reclaim()
end
return
end
However, using JULIA_CUDA_MEMORY_POOL=none does not work. The output is as follows. In particular, note that the effective GPU memory usage is still >17GB.
A
Effective GPU memory usage: 1.66% (401.688 MiB/23.643 GiB)
No memory pool is in use.
B
Effective GPU memory usage: 72.57% (17.158 GiB/23.643 GiB)
No memory pool is in use.
C
Effective GPU memory usage: 1.66% (401.688 MiB/23.643 GiB)
No memory pool is in use.
D
Effective GPU memory usage: 72.57% (17.158 GiB/23.643 GiB)
No memory pool is in use.
E
Effective GPU memory usage: 72.57% (17.158 GiB/23.643 GiB)
This looks like a bug to me. Do you agree, and if so, where do you suggest I report it?