CUDA.jl with @threads causing memory leak?

Does the multithreaded func2 (without CUDA.unsafe_free! or CUDA.reclaim) actually run out-of-memory if you call it in a loop?

I guess you have seen this comment.

It would also be interesting to see what happens if you call CUDA.reclaim on all threads.