Does writing in parallel block the whole process?

Say I have a large piece of data data that I want to write or serialize (for speed) to a file file with serialize("path/to/file", data). Done naively, this will block the rest of the program from executing. So it’s simple enough to create a new task via @spawn serialize("path/to/file", data) so the Julia scheduler will execute it on the next available thread.

Now, I happened to read on Wikipedia page for “Green Threads” (e.g. Julia’s Tasks)

When a green thread executes a blocking system call, not only is that thread blocked, but all of the threads within the process are blocked.[5] To avoid that problem, green threads must use non-blocking I/O or asynchronous I/O operations, although the increased complexity on the user side can be reduced if the virtual machine implementing the green threads spawns specific I/O processes (hidden to the user) for each I/O operation.[citation needed]

So now I’m curious, does serialization execute a blocking system-call? If so, is it non-blocking or being in another thread considered asynchronous I/O? Does LLVM take care of these aspects? Or more directly, does saving data in parallel actually block the whole process?

The serialize function uses Julia’s standard asynchronous I/O routines (write etc.), based on the libuv library, so it does not block the whole process.

2 Likes

To quote Jeff Goldblum,

Well, there it is.

Thank you @stevengj !

There’s some fine print worth noting here: All libuv stuff, including IO, is handled by a dedicated task that’s pinned to the main thread. The thread calling serialize will block until it’s been able to wake up the libuv task and hand off the IO. This is usually quick, unless there’s a long-running, non-yielding task running on the main thread and blocking the libuv task. So to make sure everything flows smoothly, make sure that all your tasks regularly hit yield points, if necessary by inserting explicit yield() calls in long-running, non-allocating loops.

2 Likes