Worker thread setup in Julia compared to C++/pthreads

To avoid the main OS thread (threadid() == 1) to be blocked, you’d need to start nthreads() - 1 workers, as you mentioned. A simple solution may be to start Julia with julia -t$number_of_cpus_plus_one and use something like

function foreach_on_non_primary(f, input::Channel)
    workers = Threads.Atomic{Int}(0)
    @sync for _ in 1:Threads.nthreads()
        Threads.@spawn if Threads.threadid() > 1
            Threads.atomic_add!(workers, 1)
            for x in input
                f(x)
            end
        end
    end
    if workers[] == 0
        # Failed to start workers. Fallback to more robust (less controlled)
        # solution:
        @sync for _ in 1:Threads.nthreads()
            Threads.@spawn for x in input
                f(x)
            end
        end
    end
end

for work in work_source
    push!(work_channel, work)
end
close(work_channel)  # instead of sentinel, you can just close it (otherwise deadlock)
foreach_on_non_primary(work_channel) do work
    push!(result_channel, compute(work))
end

(Note to wider audience: I encourage not using approach like this in released packages. Julia’s parallel task runtime needs to control the task scheduling to make parallel programs composable. Manually managing task scheduling like this defeats the design philosophy of Julia’s parallel runtime. On the other hand, if you are writing an application (not library), I don’t think there is a problem in using tricks like this. For more discussion, see the documentation of FoldsThreads.TaskPoolEx and [ANN] FoldsThreads.jl: A zoo of pluggable thread-based data-parallel execution mechanisms)

I think ThreadPools.jl’s “background threading API” like bforeach and qbforeach are somewhat equivalent to the function foreach_on_non_primary I wrote above. But they are more reliable when it comes to distributing tasks across OS threads if the worker function f yields to the scheduler immediately.

Note that the lack of user’s control yields the increase in system’s control. This is a key theme in Julia (or maybe rather in any programming systems in general) for unlocking a larger class of optimizations. Indeed, this is quite opposite in philosophy when compared to “systems” language like C++ or glue language like Python. For example, we are looking for even more restricted form of task parallelism (more discussion in my PR: RFC: Experimental API for may-happen in parallel parallelism by tkf · Pull Request #39773 · JuliaLang/julia · GitHub). I agree that Julia’s task parallelism could be confusing if you already are familiar with programming using OS-level threading. But the lack of control is actually there for leaving room for the system to improve the performance.

7 Likes