Optimizing multi-threading of tasks of varying duration

When you run @spawn x, it will spawn a task running x, and return the running task immediately. So in your latter example, it uses @tasks (which itself spawns tasks) to run the loop, and the loop then spawns tasks using @spawn. The @tasks macro has no understanding that it needs to wait for any tasks other than the ones it spawned itself.

Here are two examples of how to do it:
Setup:

julia> using OhMyThreads

julia> run_shell() = (run(`sleep 2`); run(`echo hello`))
  • Option 1: Using @set ntasks
julia> @tasks for i in 1:8
           @set ntasks = 4
           run_shell() # note that this command waits
       end
  • Option 2: Using the semaphore and spawn, no @tasks
julia> sem = Base.Semaphore(4)
       for i in 1:8
           Base.acquire(sem)
           Threads.@spawn begin
               try
                   run_shell()
               finally
                   Base.release(sem)
               end
           end
       end

The latter has two problems:

  1. The loop ends once the last task is spawned, not finished, and that’s probably not what you want
  2. If the shell command errors, the error is not handled well.

If you have only tens of thousands of tasks (or less) total to spawn, and not millions, I would recommend wrapping the above in a @sync to handle the errors and wait for all the spawned tasks to finish:

julia> @sync begin
           sem = Base.Semaphore(4)
           for i in 1:8
               Base.acquire(sem)
               Threads.@spawn begin
                   try
                       run_shell()
                   finally
                       Base.release(sem)
                   end
               end
           end
       end
1 Like