CUDAnative: examples using CUDA streams?

samo · May 28, 2019, 4:55pm

From this topic we can see that CUDA streams are supported. Are there any code examples using CUDA streams in CUDAnative to help with first steps?
I.e. how to create a stream, use it in a kernel call, sync on the stream, etc.

maleadt · May 28, 2019, 5:49pm

Lacking proper documentation (which I hope I’ll be able to get to in a couple of weeks), have a look at the tests:

Basically, stream creation etc is part of CUDAdrv, and in CUDAnative you just pass a stream argument to @cuda. And FYI, there isn’t a good mechanism to use streams with CuArrays yet.

samo · May 31, 2019, 5:52pm

Thanks for the examples!

samo · June 4, 2019, 8:24am

@maleadt, to my understanding, these tests only check that CuStream creates a new, distinct stream at every invocation, but they do not test that these streams do overlap at execution, i.e. that the kernels on these streams run concurrently (or do I get it wrong?). Is there any test that checks this functionality?
I am asking, because I cannot get streams to overlap as reported in this topic. This is fundamental to overlap communication and computation in my application…

maleadt · June 4, 2019, 12:10pm

No. If you have any suggestions for such tests, let me know.

samo · June 6, 2019, 9:12am

I do not have any suggestions right now, but I let you know if I come up with something during my investigations on overlapping of streams.

anj · September 18, 2019, 5:54pm

Any idea/hints when/if streams will be supported in CuArrays?

I am using CuArrays, however the GPU is not fully utilized, I think I have space for 2-3 more on the GPU. And as I understand/measured (on Windows) running several julia apps wouldn’t load GPU due to lack of MPS and using threads or tasks wouldn’t get us far, since streams are not supported by CuArrays and all gets serialized on a default stream .

Is Linux and MPS is the only way now to fully load GPU with CuArrays or there is anything else? (in case one’s function are small/ineffcient code)

wsphillips · September 18, 2019, 6:55pm

I think it really depends on what you’re doing. Have you tried to see whether you can implement the behavior you want using the tools in CUDA native? IIRC you can set stream on kernel launch…

anj · September 19, 2019, 1:06pm

yes. CUDAnative that was the first version of the code. However, I really liked the simplicity of CuArrays and all functionality one gets for “free” notably fusion, easy test against CPU. Thus switched to CuArrays. Now, trying to load GPU more, I am thinking running several of streams should bring us to the result faster, since I see gpu is not 100% loaded.

Topic		Replies	Views
Package use, CUDA stream support, etc GPU first-steps	5	1461	September 13, 2018
Using stream per cpu thread pattern GPU	1	901	June 8, 2019
CUDA streams do not overlap GPU question	6	3084	July 1, 2019
CUDAnative is awesome! GPU	12	5976	December 3, 2018
How to create cuda streams with different priorities? GPU question	11	2912	July 15, 2019

CUDAnative: examples using CUDA streams?

Related topics