How to create cuda streams with different priorities?

Question
How can we create (nonblocking) cuda streams with different priorities? That is, how can we do with CuArrays/CUDAnative/CUDAdrv the equivivalent to the following lines of CUDA code?

cudaStream_t streams[2];
int leastPriority=-1, greatestPriority=-1;
cudaDeviceGetStreamPriorityRange(&leastPriority, &greatestPriority);
cudaStreamCreateWithPriority(&streams[0], cudaStreamNonBlocking, greatestPriority);
cudaStreamCreateWithPriority(&streams[1], cudaStreamNonBlocking, leastPriority);

Motivation
Given here.

Thanks!!!

Looks like these aren’t wrapped by CUDAdrv.jl yet, please open an issue there and I’ll try to have a look in one of the coming weeks.

Done, the issue is opened.

@maleadt, is there any possible quick workaround to be able to use stream priorities in Julia? Together with a colleague, I am preparing a talk for JuliaCon 2019 (in 3 weeks) on a Julia Multi-GPU real-world application (using CUDAnative.jl/CuArrays/CUDAdrv and MPI.jl). It is just missing stream priorities in order to be able to overlap communication and computation (and then scale with nearly ideal parallel efficiency to thousands of GPUs)…

julia> using CUDAdrv

julia> dev = CuDevice(0)
CuDevice(0): GeForce RTX 2080 Ti

julia> ctx = CuContext(dev)
CuContext(Ptr{Nothing} @0x000000000265ddd0, true, true)

julia> s1 = CuStream()
CuStream(Ptr{Nothing} @0x0000000002fbf770, CuContext(Ptr{Nothing} @0x000000000265ddd0, true, true))

julia> priority(s::CuStream) = (prio_ref = Ref{Cint}(); CUDAdrv.@apicall(:cuStreamGetPriority, (CUDAdrv.CuStream_t, Ptr{Cint}), s, prio_ref); prio_ref[])
priority (generic function with 1 method)

julia> priority(s1)
0

julia> function CUDAdrv.CuStream(priority::Integer, flags::CUDAdrv.CUstream_flags=CUDAdrv.STREAM_DEFAULT)
           handle_ref = Ref{CUDAdrv.CuStream_t}()
           CUDAdrv.@apicall(:cuStreamCreateWithPriority , (Ptr{CUDAdrv.CuStream_t}, Cuint, Cint),
                                                          handle_ref, flags, priority)

           ctx = CuCurrentContext()
           obj = CuStream(handle_ref[], ctx)
           finalizer(CUDAdrv.unsafe_destroy!, obj)
           return obj
       end

julia> s2 = CuStream(-1)
CuStream(Ptr{Nothing} @0x0000000003934530, CuContext(Ptr{Nothing} @0x000000000265ddd0, true, true))

julia> priority(s2)
-1

That looks pretty much like what I need, thanks! I believe I would just be missing how to get the valid range of values that priority can take, i.e. the corresponding of the CUDA C code:

int leastPriority=-1, greatestPriority=-1;
cudaDeviceGetStreamPriorityRange(&leastPriority, &greatestPriority);

Adapt the priority function from above, it’s pretty similar in that it returns an integer through an argument.

While cuStreamGetPriority and cuStreamCreateWithPriority are part of the NVIDIA CUDA driver api (see here), cudaDeviceGetStreamPriorityRange is part of the NVIDIA CUDA runtime api (see here). In consequence, the following code

priorityRange() = (r1_ref = Ref{Cint}(); r2_ref = Ref{Cint}(); CUDAdrv.@apicall(:cudaDeviceGetStreamPriorityRange, (Ptr{Cint}, Ptr{Cint}), r1_ref, r2_ref); (r1_ref[], r2_ref[]))

fails as expected with the error:

julia> priorityRange()
ERROR: ccall: could not find function cudaDeviceGetStreamPriorityRange in library /opt/cray/nvidia/default/lib64/libcuda.so
Stacktrace:
 [1] macro expansion at /users/omlins/.julia/packages/CUDAdrv/3cR2F/src/base.jl:142 [inlined]
 [2] priorityRange() at ./REPL[5]:1
 [3] top-level scope at none:0

How do you address the runtime api? Do I need the package CUDAapi? Is this still further developed and supposed to be used together with CuArrays/CUDAnative and CUDAdrv?

Thanks!

Don’t use the runtime API, call cuCtxGetStreamPriorityRange from the driver API instead.

Oh sorry, I did not see this function. With that change, i.e. defining

priorityRange() = (r1_ref = Ref{Cint}(); r2_ref = Ref{Cint}(); CUDAdrv.@apicall(:cuCtxGetStreamPriorityRange, (Ptr{Cint}, Ptr{Cint}), r1_ref, r2_ref); (r1_ref[], r2_ref[]))

it works:

julia> priorityRange()
(0, -1)

@maleadt, is there a quick way to do 3-D memcopy now in Julia? You know I would like to use that to still improve the halo update of our real-world application before our talk at JuliaCon next week…
Thanks!

UPDATE: this message was not meant to be in this thread; I will post it to the correct thread with @maleadt’s reply…

No need to post in a different thread. Have a look at https://github.com/JuliaGPU/CUDAdrv.jl/issues/149#issuecomment-511338109