How to start tasks on multiple threads and control terminal output from central thread

sloede · March 17, 2020, 8:43am

While I am not new to distributed computing, I have trouble wrapping my head around how to implement the following behavior in Julia.

I have a testing tool that takes a bunch of tests (= directories with testing information), runs the tests, and displays the results on the terminal. Since all tests are serial, I would like to parallelize this in the following way (roughly):

one central tasks/thread/worker (not sure about the correct terminology, let’s call it “root”) distributes the tests among the available CPU cores
on each core, a single test is run concurrently
once a test has finished, the result should be displayed on the terminal (with actual output being controlled by root), and the now idle core picks up the next test
continue until all tests are finished

What would be the Julian way of implementing something like this? Or is this even how you would achieve the desired behavior (tests running on all cores, one core controlling I/O) with Julia?

I tried reading the official docs, but there are so many ways highlighted for doing something in parallel, I am not sure which one to pursue. Also, most tutorials/documentations I found online refer to pre-1.3 Julia (or even pre-1.0), thus I am sure they are at least in parts outdated.

Any help would be highly appreciated, even if it’s just pointing me to a good reference implementation or tutorial!

kristoffer.carlsson · March 17, 2020, 9:48am

Perhaps use a Channel (RemoteChannel) to communicate between the test runner and the root and have the root be in charge of all printing.

Julia itself runs its test in parallel but the implementation is quite simple: julia/runtests.jl at master · JuliaLang/julia · GitHub. Just remotecall_fetch on workers from a bunch of async tasks that pops the first available test from a list of tests to run.

sloede · March 17, 2020, 12:37pm

Are you referencing those lines: https://github.com/JuliaLang/julia/blob/36241a90bab7256f83392dc302808f04954a1f3b/test/runtests.jl#L193-L209? I am not sure whether I understand correctly what is going on there:

Let me see if I get this right…

In 193, a @sync block begins that ensures that execution does not proceed until all @async blocks are finished. The next line loops over all workers (we are currently on root) and creates a Task (?) using @async until we have a task for each worker. The creation of tasks is non-blocking.

In line 197, each task (which is associated with a specific worker) loops until the list of tests tests is non-empty. In each loop iteration, the next test is retrieved (line 198), which is then to be executed remotely. For this, remotecall_fetch is called with the worker id associated with this task. remotecall_fetch is blocking, i.e., it waits for the test to be finished.

Once the call to remotecall_fetch returns, further operations are performed (e.g., I/O, cleanup etc.) before the next loop iteration begins. If all tests are gone, the end of the @async block is reached and thus the code will wait at the end of the @sync block until all tasks have reached this position in the code, at which all tasks are dissolved and normal (serial) execution continues.

Did I get this (roughly) right? If not, I’d be happy to learn where I went wrong…

Otherwise, I have to follow-up questions:

Is it correct that in this setup the root never runs any tests but only schedules tasks and handles I/O?
If yes, why the check for p != 1 in lines 237-240 (https://github.com/JuliaLang/julia/blob/36241a90bab7256f83392dc302808f04954a1f3b/test/runtests.jl#L237-L240) - shouldn’t this always evaluate to true since workers() returns a list the excludes 1?
Is it reasonable to assume that on a multicore processor (e.g., 4 cores and 8 HW threads) each task will be executed on a different core?

Sorry for the long post, but I see an opportunity to finally get the knots in my head untied when it comes to Julia’s distributed programming model

kristoffer.carlsson · March 17, 2020, 12:49pm

Yes, looks good to me.

Yes with the exception of some special tests which run on the root:

github.com

JuliaLang/julia/blob/36241a90bab7256f83392dc302808f04954a1f3b/test/runtests.jl#L246


      
                      end
                      if p != 1
                          # Free up memory =)
                          rmprocs(p, waitfor=30)
                      end
                  end
              end
          end
          
          n > 1 && length(node1_tests) > 1 && print("\nExecuting tests that run on node 1 only:\n")
          for t in node1_tests
              # As above, try to run each test
              # which must run on node 1. If
              # the test fails, catch the error,
              # and either way, append the results
              # to the overall aggregator
              isolate = true
              t == "SharedArrays" && (isolate = false)
              local resp
              try
                  resp = eval(Expr(:call, () -> runtests(t, test_path(t), isolate, seed=seed))) # runtests is defined by the include above

I think that is correct.

AFAIU, it is up to the OS to do the exact scheduling onto physical cores but yes, that should happen.

sloede · March 17, 2020, 1:18pm

@kristoffer.carlsson Thanks a lot for the feedback & clarifications! I have one further question, if I may: Is it reasonable/sensible to run tests also on proc=1? As far as I can tell, the code in the references runtests.jl does not execute tests in parallel on it (except for the node1 tests). Or would tests with high CPU utilization prevent other tasks on proc=1 from functioning properly (e.g., for scheduling new tests, I/O etc.)?

kristoffer.carlsson · March 17, 2020, 1:29pm

There is no real advantage to running tests on proc=1, the number of workers is equal to the number of threads so things will be saturated anyway. And yes, if you run a test on proc=1 that takes a significant time and has no yield points it will block the printing task (and scheduling task) from running which is undesirable.

Topic		Replies	Views
Julia: how to run embarrassingly parallel jobs with nested for loops? Julia at Scale parallel , multithreading	6	1538	July 13, 2021
Questions on parallel programming terminology Julia at Scale question , parallel , distributed , threads	7	2017	May 8, 2020
Run Pkg.test with several processors Julia at Scale	16	1898	August 21, 2023
How to run two tasks on parallel? Performance parallel	30	1932	September 22, 2022
Parallel Postprocessing Julia at Scale parallel	10	862	October 5, 2019

How to start tasks on multiple threads and control terminal output from central thread

Related topics