Why can Channels not be used by threads?

I am using Channels to synchronize and distribute workloads between threads (and it is working really well).

However, the documentation about @threads states that the assumptions that go into the threading model mean that

Communicating between iterations using blocking primitives like Channels is incorrect.

Why is this so? Why is it wrong to use channels with threads?

if it’s “blocking”, it means two different iterations (also two different threads) can’t work at the same time, so what’s the point?

I mean, I have ensured that there are no race conditions or deadlocks, and the threads very much do work in parallel at the same time (the channels are used for a very small amount of time only in each iteration). So… did I just misread the documentation, and all it is saying is that you shouldn’t create deadlocks, race conditions, and need to be tiny bit careful that you still run in parallel? Because the way it is worded seems to suggest some deep technical incompatibility between threads and channels?

If I were to hazard a guess, the problem is that the @threads macro does not guarantee that all iterations will be scheduled simultaneously. Thus you could end up in a situation where iteration i calls take! on an empty channel, blocking until someone else calls put! on it—but iteration j, which is supposed to do the put!, won’t be scheduled until after iteration i is finished. Maybe the scheduler decided to run iterations i and j sequentially in the same task.

This is an issue specifically with the @threads macro, not with combining channels and threads in general. You should be safe if you replace

Threads.@threads for i in I
    # stuff
end

with

@sync for i in I
    Threads.@spawn begin
        # stuff
    end
end
2 Likes

Ah, thanks, that would make sense.

I guess, to ensure no deadlocks, I would need to know if two tasks happen to run on the same thread. That is, can I ensure that nthreads() tasks run on different threads each?

No, this is not necessary. Tasks on the same thread can yield to each other just like any other asynchronous tasks. Tasks started with Threads.@spawn and Threads.@threads [:dynamic] are not even pinned to a specific thread, and can and will migrate between threads during execution.

The gotcha when using @threads is that it does not guarantee a one-to-one correspondence between tasks and loop body iterations, and it does not guarantee that all tasks are scheduled concurrently. It may combine multiple iterations into a single task, and it may choose to wait until some tasks are finished before starting others. That’s why (quoting the documentation) The loop body for each iteration must be able to make forward progress independent of other iterations.

If you spawn exactly nthreads() tasks with Threads.@spawn, chances are they will occupy one thread each.

1 Like

I think a lot of what you’re missing is that threads share memory so you don’t need channels. You can just use a regular array that you push and pop from (with appropriate locking).

Isn’t a Channel just this but with the locking built-in and automated? From what I can tell it’s made for exactly this purpose (message-passing and synchronization between asynchronous/multithreaded tasks—not to be confused with RemoteChannel, which is for use with multiprocessing). The manual array-and-lock approach runs into exactly the same issues with @threads.

4 Likes

Yes, but as you pointed out, it has blocking semantics, which are not always acceptable or desirable. push! and pop! on Array don’t block, and pop! throws if you try to remove an element from an empty array.