Question
How can we create (nonblocking) cuda streams with different priorities? That is, how can we do with CuArrays/CUDAnative/CUDAdrv the equivivalent to the following lines of CUDA code?
@maleadt, is there any possible quick workaround to be able to use stream priorities in Julia? Together with a colleague, I am preparing a talk for JuliaCon 2019 (in 3 weeks) on a Julia Multi-GPU real-world application (using CUDAnative.jl/CuArrays/CUDAdrv and MPI.jl). It is just missing stream priorities in order to be able to overlap communication and computation (and then scale with nearly ideal parallel efficiency to thousands of GPUs)…
That looks pretty much like what I need, thanks! I believe I would just be missing how to get the valid range of values that priority can take, i.e. the corresponding of the CUDA C code:
int leastPriority=-1, greatestPriority=-1;
cudaDeviceGetStreamPriorityRange(&leastPriority, &greatestPriority);
While cuStreamGetPriority and cuStreamCreateWithPriority are part of the NVIDIA CUDA driver api (see here), cudaDeviceGetStreamPriorityRange is part of the NVIDIA CUDA runtime api (see here). In consequence, the following code
julia> priorityRange()
ERROR: ccall: could not find function cudaDeviceGetStreamPriorityRange in library /opt/cray/nvidia/default/lib64/libcuda.so
Stacktrace:
[1] macro expansion at /users/omlins/.julia/packages/CUDAdrv/3cR2F/src/base.jl:142 [inlined]
[2] priorityRange() at ./REPL[5]:1
[3] top-level scope at none:0
How do you address the runtime api? Do I need the package CUDAapi? Is this still further developed and supposed to be used together with CuArrays/CUDAnative and CUDAdrv?
@maleadt, is there a quick way to do 3-D memcopy now in Julia? You know I would like to use that to still improve the halo update of our real-world application before our talk at JuliaCon next week…
Thanks!
UPDATE: this message was not meant to be in this thread; I will post it to the correct thread with @maleadt’s reply…