I want to poll a global flag inside of a kernel in CUDA so as to shut down the kernel gracefully via the Host.
- How do I set up a Global flag in the host (I have tried abort=CuArray([false]) )
- Do I pass ‘abort’ as an arg to the kernel function? and can I check it’s value there?
- While the kernel is running (takes about 3 seconds) without sync: after 0.5 sec I set ‘abort’ to true in the host-side, but this has no affect - the kernel carries on running to the 3 second completion.
a = cu([0]; unified=true)
function kernel(b)
b[1] > 0 && (@cuprintln("ABORTED"); return nothing)
for _ in 1:200000
a = sqrt(2)
end
return nothing
end
println("START")
fill!(a, 0)
@cuda threads=16 blocks=1 kernel(a)
fill!(a, 1) # has no effect on the kernel
println("DONE")
I’m doing something basically wrong here. Will someone put me right please?
1 Like
That won’t work like that; the fill!
uses an API call which is executed in-order, and thus waits for the kernel to complete. Either perform that call on a different stream (using CUDA.stream!
, or from a different task), or use an allocation that doesn’t require. Typically that’s a device-mapped host allocation, using Mem.alloc(Mem.Host, sizeof(Int), Mem.HOSTALLOC_DEVICEMAP)
(then wrapped to an Array
using unsafe_wrap
). See the exception_flag
in CUDA.jl for an example use of this.
Thanks for that.
I’m a newbie at CUDA.jl although I read up on CUDA several years ago.
How would I use CUDA.stream! ?
Would it be
CUDA.stream!(CuStream()) do; fill!(a,1);end
Something like that should work, yes. You can also use tasks as shown here: CUDA.jl 3.0 ⋅ JuliaGPU
But if you just want to set a single flag, using a device-mapped host allocation is probably a better choice (also guarantees that the GPU will read the flag as soon as its set on the CPU).