Reporting allocations per stream in multithread CUDA.jl application

jpdoane · February 5, 2022, 4:49pm

Is there a reliable way to monitor CUDA.jl allocations for each stream in a multithreaded application? In trying to track down the source of some excessive allocations using @CUDA.time macros, I am seeing some operations appear to allocate that clearly dont, e.g.:

             CUDA.synchronize()
             @info "Not doing anything here..."
             @CUDA.time sleep(.5)
             @info "Done"
             CUDA.synchronize()

which results in:

[ Info: Not doing anything here...
  0.500947 seconds (857 CPU allocations: 43.234 KiB) (5 GPU allocations: 5.008 MiB, 0.01% memmgmt time)
[ Info: Done

I assume that what is being reported here are allocations that are occurring in another stream/thread? Is this how @CUDA.time works, or is something else wacky going on here? Is there a better method for reporting allocations specifically for each stream/thread?

maleadt · February 7, 2022, 9:50am

Yeah, CUDA.@time currently doesn’t take tasks or threads into account. That would be a useful addition though, can you open an issue on the CUDA.jl repository?

Topic		Replies	Views
Occasional long delays in CUDA.jl GPU	17	1697	March 15, 2025
Synchronize streams in CUDA.jl GPU gpu , cuda	11	441	August 23, 2024
CUDA.jl - Multiple Threads to Initiate Same CUDA Algorithm GPU parallel , multithreading , cuda , concurrency	3	1733	April 26, 2022
CUDA.jl v3.0 Package Announcements	10	1359	April 16, 2021
CUDA arrays not working well with broadcast!(), and other in-place operations inside a loop GPU gpu , broadcast , loops	4	727	June 1, 2022

Reporting allocations per stream in multithread CUDA.jl application

Related topics