CUDA.jl - When to synchronize

The project I’m working on is run entirely on a GPU. What I mean by that is that there is no copying back and forth between the GPU and CPU until the very end. Along the way, I go through 5 main types of operations:

Array programming
Kernels (3 functions I couldn’t figure out with array programming)
mapreducedim-type function calls
Calls to cufinufft in python (gpu version of flatiron institute’s NUFFT code)
Calls to a GPU compiled LAMMPS through LAMMPS.jl (molecular dynamics code written in C++)

I’m not quite sure where I need to put explicit synchronize commands in the code. I think that array programming calls it for me, so I’m thinking I may need to call it before and after the kernels, the call to the cufinufft library in python, and the call to LAMMPS. Is this correct?

Is there a general rule that I should follow, or is it case dependent?

1 Like

Everytime you are passing data to be processed on another stream you need to synchronize beforehand.

So as an example if you are passing a CuArray to a C++ library that launches CUDA operations internally, you will need to synchronize before the ccall (and at the end of the C++ code) to make sure that all operations created by Julia are finished before C++ operates on the memory and vice-versa.

Awesome, thank you. But the same does not apply to kernels compiled with @cuda, correct? These would not be on another stream.

Correct. In fact, with the latest version of CUDA.jl it’s not strictly required anymore to synchronize when performing operations on other streams, as CUDA.jl will synchronize for you: CUDA.jl 5.4: Memory management mayhem ⋅ JuliaGPU. This of course does not hold when calling out to non-CUDA.jl code.