I’m new to asynchronous programming and have a use case that I’m not sure how to express in terms of Julia concepts. I’d greatly appreciate any suggestions others may have to offer.
I have a task A, that consists of concurrent subtasks A1, A2, .. , An. Each subtask has a finite sequence of things to produce for a global consumer B, but at various points along its sequence it will need to wait for a response from B before continuing. However, B cannot interact with each subtask independently; due to interactions between the things produced by A1,…,An, B needs to wait until all the subtasks have reach a point where they waiting for a response from B. When this happens B will process everything they have produced in the meantime, then return responses to the respective subtasks so that they can continue. This back-and-forth continues until all the subtasks are finished, at which point A and B both conclude. They way I am thinking of the core interaction is that A1,…,An collectively yield to B and then B yields back collectively to A1,…,An, each one resuming where it left off. But I am not sure if that is a helpful way to think of it.
Any suggestions?
A pragmatic solution is to have two channels, one for work orders and one for work completions:
mutable struct CyclicBarrierChannel
nWorkers::Int
jobs::Channel{Nothing}
completions::Channel{Nothing}
end
(::Type{CyclicBarrierChannel})(n) = CyclicBarrierChannel(n, Channel{Nothing}(n), Channel{Nothing}(n))
function workerDoneAndWait(barrier)
put!(barrier.completions, nothing)
take!(barrier.jobs)
end
function bossWaitForWorkers(barrier)
for i=1:barrier.nWorkers
take!(barrier.completions)
end
end
function bossReleaseWorkers(barrier)
for i=1:barrier.nWorkers
put!(barrier.jobs, nothing)
end
end
Now, you can of course also brew your own thing in analogy to CyclicBarrier (Java Platform SE 8 )
That’s appropriate if the code is mostly for yourself (yay learning experience), or if your code is too performance critical for the redundant extra thread switch from iteratively calling take!
, or if you’re writing a big library for many users. Otherwise, the code review and maintenance burden for a latch is probably not worth it.
Unclap spoiler warning for a by-hand solution (not tested)
A non-pragmatic solution would be something like
```
mutable struct CyclicBarrier
const nWorkers::Int
@atomic nMissing::Int
workers::Base.GenericCondition{ReentrantLock}
boss::Base.GenericCondition{ReentrantLock}
end
(::Type{CyclicBarrier})(nWorkers)=CyclicBarrier(nWorkers, nWorkers, Threads.Condition(), Threads.Condition())
function workerDoneAndWait(barrier)
@lock barrier.workers.lock begin
nMissing = @atomic barrier.nMissing -= 1
nMissing == 0 && @lock barrier.boss.lock notify(barrier.boss)
while @atomic barrier.nMissing == 0
wait(barrier.workers)
end
end
end
function bossWaitForWorkers(barrier)
@lock barrier.boss.lock while @atomic barrier.nMissing != 0
wait(barrier.boss)
end
end
function bossReleaseWorkers(barrier)
@lock barrier.workers.lock begin
@assert 0 == @atomicswap barrier.nMissing = barrier.nWorkers
@assert barrier.nWorkers == notify(barrier.workers)
end
end
</details>
1 Like
Thanks, this is giving me some directions to try. For now this is for a personal project and my own education, but I’m still interested in having reasonably good performance. I’m curious why you say the solutions above are not performant or non-pragmatic.
The pragmatic channel solution wakes up the “boss” task (your task B) on every completion of a worker. This is of course silly, it only needs to be woken when the last worker completes. Hence, it has about 2x the scheduler overhead.
But it is obviously correct, and its obvious to write and to reason about, easy to explain to people with an analogy (“oh, we hand out work-orders and completion receipts between workers and boss thread, using a well-known channel abstraction”).
The extra 2x scheduler overhead should not matter very much for performance in most usecases: If scheduler overhead is a serious limitation, then your algorithm sucks and you should fix that.
On the other side, writing and reviewing the kind of locking / atomics code of the by-hand solution is not entirely trivial, and bugs happen. The empirical rate of race condition bugs in “let’s do concurrency by hand, can’t be so hard”-code is very large.
Yet, it’s not aesthetically pleasing to burn all these perfectly fine CPU cycles for “humility” when one has an “obviously” correct solution. (but I must admit that I have produced plenty of data races in “obviously correct” code, due to faulty reasoning).
So it comes down to your project goals. Building your own is more fun and educational and aesthetically pleasing; but certainly not pragmatic, the stupid wasteful work-slip channel should be fast enough.
2 Likes
Not quite the same, but FWIW 1.12 will have a waitall
function (julia/NEWS.md at v1.12.0-beta1 · JuliaLang/julia · GitHub).