Occasional long delays in CUDA.jl

From talking with @vchuravy, I think the limitation is “just” I/O that’s performed using libuv, due to it’s single-threaded event loop. So IIUC channels, locks, etc do not fall under this restriction, but here we’re using AsyncCondition and Timer, both of which are implemented using libuv.

Sorry, I’m not familiar enough with libuv to know what could cause this. Maybe you’re calling into a library from the libuv thread, and that system library is then doing a blocking syscall?

For a solution to your issue, either we could introduce an environment variable that disables the nonblocking synchronization, or we could rework it so that it doesn’t rely on libuv. The simplest solution that comes to mind then is a busy-loop that calls yield() (to give other tasks the opportunity to execute), maybe with some exponential delay and a sleep() to not hog the CPU too much (although CUDA’s default synchronization behavior is to spin as well).