Big performance slowdown increasing Julia threads but keeping parallelism the same?

I have some code for processing files in parallel like so (simplified):

function process_files_parallel(paths, n_parallel = 12)
    # create to-do list
    todo_channel = Channel{eltype(paths)}(length(paths))
    for p in paths
        put!(todo_channel, p)
    end
    close(todo_channel)
    # spawn worker tasks
    tasks = map(1:n_parallel) do i
        Threads.@spawn begin
            for path in todo_channel
                do_something(path)
            end
        end
    end
    # wait for all worker tasks to conclude, meaning we processed all files
    wait.(tasks)
end

On a 16 core EC2 machine, timings for process_files_parallel(lst_of_36_paths, 6) (ie 6 worker tasks):

  • If I run Julia (via the VSCode REPL, so including Revise etc) with "julia.NumThreads" : 6 => 36 seconds
  • "julia.NumThreads" : 16 => 70 seconds

I understand [some] reasons why running too many workers in parallel can lead to slowdowns: thrashing the CPU, worse cache locality, high context switching costs, etc. But why does increasing the number of threads available to Julia with the same number of workers produce a big slowdown? Can’t the extra threads just…do nothing? Is Julia spending tons of compute cycles juggling the 6 worker tasks between 16 threads?

I have to admit that I do not understand your code. Just some general remarks:

  • more threads means higher pressure on the garbage collector; make sure to benchmark this effect and check the percentage of time used by the GC
  • more threads put more pressure on the CPU cache

You may be hitting lock and channel scalability issues, where threads are woken up too aggressively. There’s been some recent changes to locks, but nothing yet to fix the channel behavior: ReentrantLock: wakeup a single task on unlock and add a short spin by andrebsguedes · Pull Request #56814 · JuliaLang/julia · GitHub

If using a Channel is causing issue, perhaps try accessing the paths via an index wrapped in Threads.Atomic

Code Outline
function process_files_parallel2(paths, n_parallel = 12)
    # create to-do list
    todo_vector = collect(paths)
    path_index = Threads.Atomic{Int}(1)

    # spawn worker tasks
    @sync for t in 1:n_parallel
        Threads.@spawn begin
            while true
                i = atomic_add!(path_index, 1)
                if i in eachindex(todo_vector)
                    do_something(todo_vector[i])
                else
                    break
                end
            end
        end
    end
end

It’s a bit difficult to trace the problem without a running example.

However, I’ve had a similar issue, though not entirely the same, some time ago concerning scheduling of tasks. My issue was with a constant number of threads, but with varying number of tasks. Task scheduling also depends on the number of threads, so it could be related. It’s in Task scheduling regression · Issue #54101 · JuliaLang/julia · GitHub.

Have you tried your example with julia 1.10.7 and with the nightly version?