Check out the following: CUDA streams do not overlap
… and note that you can also use ParallelStencil.ParallelKernel.@get_priority_stream(i)
.
However, you might rather want to create one or a few larger kernel instead of all these small kernels…
Check out the following: CUDA streams do not overlap
… and note that you can also use ParallelStencil.ParallelKernel.@get_priority_stream(i)
.
However, you might rather want to create one or a few larger kernel instead of all these small kernels…