How to start tasks on multiple threads and control terminal output from central thread

While I am not new to distributed computing, I have trouble wrapping my head around how to implement the following behavior in Julia.

I have a testing tool that takes a bunch of tests (= directories with testing information), runs the tests, and displays the results on the terminal. Since all tests are serial, I would like to parallelize this in the following way (roughly):

  • one central tasks/thread/worker (not sure about the correct terminology, let’s call it “root”) distributes the tests among the available CPU cores
  • on each core, a single test is run concurrently
  • once a test has finished, the result should be displayed on the terminal (with actual output being controlled by root), and the now idle core picks up the next test
  • continue until all tests are finished

What would be the Julian way of implementing something like this? Or is this even how you would achieve the desired behavior (tests running on all cores, one core controlling I/O) with Julia?

I tried reading the official docs, but there are so many ways highlighted for doing something in parallel, I am not sure which one to pursue. Also, most tutorials/documentations I found online refer to pre-1.3 Julia (or even pre-1.0), thus I am sure they are at least in parts outdated.

Any help would be highly appreciated, even if it’s just pointing me to a good reference implementation or tutorial!

Perhaps use a Channel (RemoteChannel) to communicate between the test runner and the root and have the root be in charge of all printing.

Julia itself runs its test in parallel but the implementation is quite simple: julia/runtests.jl at master · JuliaLang/julia · GitHub. Just remotecall_fetch on workers from a bunch of async tasks that pops the first available test from a list of tests to run.

1 Like

Are you referencing those lines: https://github.com/JuliaLang/julia/blob/36241a90bab7256f83392dc302808f04954a1f3b/test/runtests.jl#L193-L209? I am not sure whether I understand correctly what is going on there:

Let me see if I get this right…

In 193, a @sync block begins that ensures that execution does not proceed until all @async blocks are finished. The next line loops over all workers (we are currently on root) and creates a Task (?) using @async until we have a task for each worker. The creation of tasks is non-blocking.

In line 197, each task (which is associated with a specific worker) loops until the list of tests tests is non-empty. In each loop iteration, the next test is retrieved (line 198), which is then to be executed remotely. For this, remotecall_fetch is called with the worker id associated with this task. remotecall_fetch is blocking, i.e., it waits for the test to be finished.

Once the call to remotecall_fetch returns, further operations are performed (e.g., I/O, cleanup etc.) before the next loop iteration begins. If all tests are gone, the end of the @async block is reached and thus the code will wait at the end of the @sync block until all tasks have reached this position in the code, at which all tasks are dissolved and normal (serial) execution continues.

Did I get this (roughly) right? If not, I’d be happy to learn where I went wrong…

Otherwise, I have to follow-up questions:

  1. Is it correct that in this setup the root never runs any tests but only schedules tasks and handles I/O?
  2. If yes, why the check for p != 1 in lines 237-240 (https://github.com/JuliaLang/julia/blob/36241a90bab7256f83392dc302808f04954a1f3b/test/runtests.jl#L237-L240) - shouldn’t this always evaluate to true since workers() returns a list the excludes 1?
  3. Is it reasonable to assume that on a multicore processor (e.g., 4 cores and 8 HW threads) each task will be executed on a different core?

Sorry for the long post, but I see an opportunity to finally get the knots in my head untied when it comes to Julia’s distributed programming model :exploding_head:

Yes, looks good to me.

Yes with the exception of some special tests which run on the root:

I think that is correct.

AFAIU, it is up to the OS to do the exact scheduling onto physical cores but yes, that should happen.

@kristoffer.carlsson Thanks a lot for the feedback & clarifications! I have one further question, if I may: Is it reasonable/sensible to run tests also on proc=1? As far as I can tell, the code in the references runtests.jl does not execute tests in parallel on it (except for the node1 tests). Or would tests with high CPU utilization prevent other tasks on proc=1 from functioning properly (e.g., for scheduling new tests, I/O etc.)?

There is no real advantage to running tests on proc=1, the number of workers is equal to the number of threads so things will be saturated anyway. And yes, if you run a test on proc=1 that takes a significant time and has no yield points it will block the printing task (and scheduling task) from running which is undesirable.