CUDA.jl with @threads causing memory leak?

Hello. First of all, I am using Julia v1.12.6, CUDA v5.11.2, and FLoops v0.2.2. Consider the following code and comments.

using CUDA
using Base.Threads

function func1()
    A = CuArray{Float64}(undef, 1500, 1500, 1000)
    for i in 1:size(A,3)
        A_ = view(A, :, :, i)
    end
end

function func2()
    A = CuArray{Float64}(undef, 1500, 1500, 1000)
    @threads for i in 1:size(A,3)
        A_ = view(A, :, :, i)
    end
    return
end

# GPU almost empty
println("\nA")
CUDA.memory_status()

# do task without threading
func1()

# memory still allocated, but that is okay
println("\nB")
CUDA.memory_status()

# manually clean up
GC.gc()
CUDA.reclaim()

# GPU empty
println("\nC")
CUDA.memory_status()

# do task with threading threading
func2()

# memory still allocated, but that is okay
println("\nD")
CUDA.memory_status()

# manually clean up
GC.gc()
CUDA.reclaim()

# NOT DEALLOCATED!!!!! BUG?????
println("\nE")
CUDA.memory_status()

When I run this code I get the following output

A
Effective GPU memory usage: 3.39% (821.250 MiB/23.643 GiB)
Memory pool usage: 0 bytes (0 bytes reserved)

B
Effective GPU memory usage: 74.37% (17.583 GiB/23.643 GiB)
Memory pool usage: 16.764 GiB (16.781 GiB reserved)

C
Effective GPU memory usage: 3.39% (821.250 MiB/23.643 GiB)
Memory pool usage: 0 bytes (0 bytes reserved)

D
Effective GPU memory usage: 74.39% (17.587 GiB/23.643 GiB)
Memory pool usage: 16.764 GiB (16.781 GiB reserved)

E
Effective GPU memory usage: 74.39% (17.587 GiB/23.643 GiB)
Memory pool usage: 16.764 GiB (16.781 GiB reserved)

Note that memory is not freed between E and D. What is going on?