Synchronize streams in CUDA.jl

I wouldn’t do that. It’s possible, but just change the stream for that task / @async block by calling stream! (without the do-block syntax).

Yeah that could be problematic, as it will cause additional synchronization. The assumption is currently baked into CUDA.jl’s memory handling: CUDA.jl/src/memory.jl at 76e2972814a0e7910f35ed3ad17b1a9198628f34 · JuliaGPU/CUDA.jl · GitHub

What’s the reason behind sharing arrays between different streams? Do you really have several kernels operating on subsets of the memory, only to copy the pieces of memory that are ready?