Using stream per cpu thread pattern

Alex_Ellison · June 7, 2019, 5:49pm

This CUDA tutorial demonstrates how multiple cpu threads can be spawned and each will have its own stream, which achieves great concurrency on the GPU (see section " A Multi-threading Example") GPU Pro Tip: CUDA 7 Streams Simplify Concurrency | NVIDIA Technical Blog . Ultimately, I’d like to have a multi-threaded program where each thread can spawn kernels on its own stream, and can sync/wait for them.

I’m trying to replicate this by modifying the MWE from this other active thread CUDA streams do not overlap. I’m just adding

My code and output are:

using CUDAdrv, CUDAnative, CuArrays

function memcopy!(A, B)
    ix = (blockIdx().x-1) * blockDim().x + threadIdx().x
    A[ix] = B[ix]
    return
end

function main()
    nx = 128*1024^2
    nt = 100
    nthreads = 1024
    nblocks = ceil(Int, nx/nthreads)
    Threads.@threads for i = 1:2
        A = CuArray(zeros(nx))
        B = CuArray(ones(nx))
        s = CuStream()
        @cuda blocks=nblocks threads=nthreads stream=s memcopy!(A, B);
        CUDAdrv.synchronize(s)
    end
end

main()

Which, only with the Threads.@threads, produces:

Error thrown in threaded loop on thread 1: CUDAdrv.CuError(code=201, meta=nothing)
julia>

Without multiple threads, I have no problem. Is there a better way I should be either multithreading or using the CUDAdrv/native packages?

Thanks!

vchuravy · June 8, 2019, 9:08pm

This is currently no possible, Julia’s threading support is experimental and the combination with CUDA is not something that currently works (as far as I know).

Topic		Replies	Views
CUDA.jl - Multiple Threads to Initiate Same CUDA Algorithm GPU parallel , multithreading , cuda , concurrency	3	1751	April 26, 2022
CUDAnative: examples using CUDA streams? GPU question	8	1684	September 19, 2019
CUDAnative use multiple GPUs GPU gpu , cudanative , parallel	5	1771	March 24, 2018
Synchronize streams in CUDA.jl GPU gpu , cuda	11	482	August 23, 2024
Multi-threaded calls to CUDA matrix multiplication GPU question , multithreading , cuda	5	832	August 13, 2023

Using stream per cpu thread pattern

Related topics