Hi, I’m just starting to think about writing some Julia GPU code and am trying to bottom out a few things so would appreciate some pointers. First off, what are the currently preferred packages to use? I’ve Nvidia hardware and some existing Julia code I’d like to convert which I think will probably require writing some custom kernels and performing some FFTs. To me it seems that CUDANative and GPUArrays should be able to cover that. Secondly, from my previous dabblings with GPU code outside of Julia, I already suspect that host-> device (and vice versa) transfer times are going to be a bottleneck. Having a quick google shows that CUDA streams are the way to get async transfers going. CUDANative seems to support these, however I can’t see a way of getting stream support with GPUArrays (or CUArrays) to be able to perform FFTs - is this possible? Finally, are CUDANative and GPUArrays cross compatible? Do they share common types?
If you have an NVIDIA hardware, you might consider
CuArrays.jl which is one of the backends of
GPUArrays. It supports FFT (have a look at the
test for example).
CuArrays depends on CUDAnative, so yes they both work together:
CUDAnative if you need the granularity, CuArrays if you are fine with array-based abstractions (which are pretty powerful, as they support broadcasting arbitrary user code). GPUArrays is the vendor-neutral version of these packages which has CuArrays/CUDAnative as a back-end but with vendor-neutral abstractions (covering fewer features though).
Both are supported by CUDAdrv/CUDAnative, but not by CuArrays yet. If you have compelling use cases, or ideas about API design, feel free to file issues on the CuArrays.jl repository.
Yes, if you pass a GPUArray (CuArray, specifically) to a
@cuda invocation, all should work. The vendor-neutral alternative here is to use GPUArray’s
gpu_call, but as I said not all CUDAnative features are covered by GPUArray’s kernel programming API.
Thanks for the answers. I might have a think about how to add stream support to CuArrays - would it be possible to have a macro that set the stream number for a given code block? Could a similar mechanism to the array bounds checking macros be used or do they require compiler support?. In the mean time, I’m hoping I can use CuArrays as a reference for calling cuFFT directly but add in the code to specify streams. Hopefully I just need to add a call to cufftSetStream.
I haven’t really put much thought in it either, but we’d probably need to maintain a global
current_stream (that defaults to the default stream), and pass that around appropriately (ie. to all libraries we wrap). Or switch streams for the libraries when the user requests it. It’d be smart to have a look at how other array libraries tackle this (eg. ArrayFire).
would it be possible to have a macro that set the stream number for a given code block
Sure, a do-block would probably be sufficient though.