Package use, CUDA stream support, etc

drinkwell · September 12, 2018, 4:21pm

Hi, I’m just starting to think about writing some Julia GPU code and am trying to bottom out a few things so would appreciate some pointers. First off, what are the currently preferred packages to use? I’ve Nvidia hardware and some existing Julia code I’d like to convert which I think will probably require writing some custom kernels and performing some FFTs. To me it seems that CUDANative and GPUArrays should be able to cover that. Secondly, from my previous dabblings with GPU code outside of Julia, I already suspect that host-> device (and vice versa) transfer times are going to be a bottleneck. Having a quick google shows that CUDA streams are the way to get async transfers going. CUDANative seems to support these, however I can’t see a way of getting stream support with GPUArrays (or CUArrays) to be able to perform FFTs - is this possible? Finally, are CUDANative and GPUArrays cross compatible? Do they share common types?

Thanks

rveltz · September 12, 2018, 6:49pm

Hi,

If you have an NVIDIA hardware, you might consider CuArrays.jl which is one of the backends of GPUArrays. It supports FFT (have a look at the test for example).

Raf · September 13, 2018, 1:02am

CuArrays depends on CUDAnative, so yes they both work together:

https://github.com/JuliaGPU/CuArrays.jl/blob/master/REQUIRE

maleadt · September 13, 2018, 5:55am

CUDAnative if you need the granularity, CuArrays if you are fine with array-based abstractions (which are pretty powerful, as they support broadcasting arbitrary user code). GPUArrays is the vendor-neutral version of these packages which has CuArrays/CUDAnative as a back-end but with vendor-neutral abstractions (covering fewer features though).

Both are supported by CUDAdrv/CUDAnative, but not by CuArrays yet. If you have compelling use cases, or ideas about API design, feel free to file issues on the CuArrays.jl repository.

Yes, if you pass a GPUArray (CuArray, specifically) to a @cuda invocation, all should work. The vendor-neutral alternative here is to use GPUArray’s gpu_call, but as I said not all CUDAnative features are covered by GPUArray’s kernel programming API.

drinkwell · September 13, 2018, 2:41pm

Thanks for the answers. I might have a think about how to add stream support to CuArrays - would it be possible to have a macro that set the stream number for a given code block? Could a similar mechanism to the array bounds checking macros be used or do they require compiler support?. In the mean time, I’m hoping I can use CuArrays as a reference for calling cuFFT directly but add in the code to specify streams. Hopefully I just need to add a call to cufftSetStream.

maleadt · September 13, 2018, 2:59pm

I haven’t really put much thought in it either, but we’d probably need to maintain a global current_stream (that defaults to the default stream), and pass that around appropriately (ie. to all libraries we wrap). Or switch streams for the libraries when the user requests it. It’d be smart to have a look at how other array libraries tackle this (eg. ArrayFire).

would it be possible to have a macro that set the stream number for a given code block

Sure, a do-block would probably be sufficient though.

Topic		Replies	Views
Can we make a comparison/overview of different GPU computing implementations? GPU	6	2622	May 19, 2019
CUDAnative is awesome! GPU	12	5977	December 3, 2018
CUDAnative: examples using CUDA streams? GPU question	8	1686	September 19, 2019
Choosing a GPU interface New to Julia gpu	6	2013	February 25, 2019
Tutorial on GPU programming on julia GPU	5	5885	March 19, 2019

Package use, CUDA stream support, etc

Related topics