Hi, I’m just starting to think about writing some Julia GPU code and am trying to bottom out a few things so would appreciate some pointers. First off, what are the currently preferred packages to use? I’ve Nvidia hardware and some existing Julia code I’d like to convert which I think will probably require writing some custom kernels and performing some FFTs. To me it seems that CUDANative and GPUArrays should be able to cover that. Secondly, from my previous dabblings with GPU code outside of Julia, I already suspect that host-> device (and vice versa) transfer times are going to be a bottleneck. Having a quick google shows that CUDA streams are the way to get async transfers going. CUDANative seems to support these, however I can’t see a way of getting stream support with GPUArrays (or CUArrays) to be able to perform FFTs - is this possible? Finally, are CUDANative and GPUArrays cross compatible? Do they share common types?