Hello. First of all, I am using Julia v1.12.6, CUDA v5.11.2, and FLoops v0.2.2. Consider the following code and comments.
using CUDA
using Base.Threads
function func1()
A = CuArray{Float64}(undef, 1500, 1500, 1000)
for i in 1:size(A,3)
A_ = view(A, :, :, i)
end
end
function func2()
A = CuArray{Float64}(undef, 1500, 1500, 1000)
@threads for i in 1:size(A,3)
A_ = view(A, :, :, i)
end
return
end
# GPU almost empty
println("\nA")
CUDA.memory_status()
# do task without threading
func1()
# memory still allocated, but that is okay
println("\nB")
CUDA.memory_status()
# manually clean up
GC.gc()
CUDA.reclaim()
# GPU empty
println("\nC")
CUDA.memory_status()
# do task with threading threading
func2()
# memory still allocated, but that is okay
println("\nD")
CUDA.memory_status()
# manually clean up
GC.gc()
CUDA.reclaim()
# NOT DEALLOCATED!!!!! BUG?????
println("\nE")
CUDA.memory_status()
When I run this code I get the following output
A
Effective GPU memory usage: 3.39% (821.250 MiB/23.643 GiB)
Memory pool usage: 0 bytes (0 bytes reserved)
B
Effective GPU memory usage: 74.37% (17.583 GiB/23.643 GiB)
Memory pool usage: 16.764 GiB (16.781 GiB reserved)
C
Effective GPU memory usage: 3.39% (821.250 MiB/23.643 GiB)
Memory pool usage: 0 bytes (0 bytes reserved)
D
Effective GPU memory usage: 74.39% (17.587 GiB/23.643 GiB)
Memory pool usage: 16.764 GiB (16.781 GiB reserved)
E
Effective GPU memory usage: 74.39% (17.587 GiB/23.643 GiB)
Memory pool usage: 16.764 GiB (16.781 GiB reserved)
Note that memory is not freed between E and D. What is going on?