Error handling in CUDA kernels

In my CUDA kernel I check for a specific condition and would like to terminate the execution if the condition is fulfilled. What is the proper way to do so? How can I throw an error from inside a CUDA kernel?

1 Like

You can just call error("") but the user will only see the specific error when they run with julia -g2

1 Like

Thank you. It works.

In case of normal (non debug run) the thrown error is accompanied by continuously repeating message

ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.

I guess there is a typo in this message: should be “an exception” instead of “a exception”.

The proper solution is to restructure your control flow such that you can do an early return. One alternative is to use the exit PTX instruction using LLVM.jl’s @asmcall, but not every GPU supports that instruction, and we’ve encountered miscompilations in the presence of such control flow.

I can do the early return. But how I can signal that the return was triggered by an error? In normal functions I can return some variable as an error code, but CUDA kernels do not allow to return anything.

You can allocate a global flag and write to it from your kernel. This can be a single-element CuArray you pass as an argument, or something fancier (e.g., look for exception_flag in CUDA.jl, that’s a global flag allocated in CPU memory mapped into GPU address space so that we can more easily read the value without synchronizing the GPU).

1 Like

Thank you. I will try.