CUDA Array broadcasting: possible to use a different stream for a code block?

samo · September 18, 2020, 1:36pm

I am wondering if it is possible somehow to specify the stream for a code block with array computations that rely on CuArray’s broadcasting capability as e.g. A .= A .+ B in this snippet:

using CUDA
A = CUDA.zeros(2,3)
B = CUDA.ones(2,3)
A .= A .+ B

Thanks!!

maleadt · September 21, 2020, 6:35am

There currently isn’t. We need to set-up some global but task-local state to set/get the stream, and make all functions (like broadcast use that). For now though, we’ve switched to using the implicit per-thread stream, so if you perform those computations on a separate thread you should get the same effect.

samo · September 21, 2020, 8:31am

Thanks @maleadt for the reply. However, we would like to run these computations on a high priority stream, whereas implicit per-thread streams are normal priority, I assume.

Topic		Replies	Views
How to create cuda streams with different priorities? GPU question	11	2912	July 15, 2019
Using stream per cpu thread pattern GPU	1	901	June 8, 2019
Synchronize streams in CUDA.jl GPU gpu , cuda	11	485	August 23, 2024
Broadcasting in CUDA kernels General Usage gpu , cuda	7	1737	June 8, 2021
CUDA streams do not overlap GPU question	6	3084	July 1, 2019

CUDA Array broadcasting: possible to use a different stream for a code block?

Related topics