Understanding nested Threads.@threads scheduling

jw3126 · May 29, 2020, 8:05am

When I nest Threads.@threads loops, scheduling looks strange:

using Base.Threads
using Dates

function tmap(f, arr)
    out = similar(arr, Any)
    Threads.@threads for i in eachindex(arr)
        out[i] = f(arr[i])
    end
    return [out...]
end

reltime() = now() - t0
function stamp(args...)
    println("$(reltime()) $args")
end

@show Threads.nthreads()
t0 = now()
tmap(1:2) do i
    tmap(10:10:100) do j
        stamp(i,j)
        sleep(1)
    end
end

Threads.nthreads() = 8
97 milliseconds (1, 60)  # 9 threads used
97 milliseconds (1, 70)  # 9 threads used
97 milliseconds (1, 80)  # 9 threads used
97 milliseconds (1, 100)  # 9 threads used
97 milliseconds (2, 10)  # 9 threads used
97 milliseconds (1, 90)  # 9 threads used
97 milliseconds (1, 50)  # 9 threads used
97 milliseconds (1, 30)  # 9 threads used
97 milliseconds (1, 10)  # 9 threads used
1098 milliseconds (2, 20)  # 3 threads used
1098 milliseconds (1, 20)  # 3 threads used
1099 milliseconds (1, 40)  # 3 threads used
2100 milliseconds (2, 30)  # 1 threads used
3101 milliseconds (2, 40)  # 1 threads used
4102 milliseconds (2, 50)  # 1 threads used
5103 milliseconds (2, 60)  # 1 threads used
6105 milliseconds (2, 70)  # 1 threads used
7106 milliseconds (2, 80)  # 1 threads used
8107 milliseconds (2, 90)  # 1 threads used
9108 milliseconds (2, 100)  # 1 threads used

So initially this exploits 9 threads (8 is what I had expected?). Then it uses only 3 threads and then it is single threaded for a long time. Is this expected? Is this a bug? Why is it like this?

pixel27 · May 29, 2020, 3:15pm

I don’t believe nested threads are really “handled”. My understanding of the @threads macro is that it breaks the loop up into N chunks based on the items in the loop. Each chunk is processed on it’s own thread. So if there are only 2 items in the loop this will only use 2 threads, thread 1 get’s offset 1 and thread 2 get’s offset 2.

If you have 8 threads and 12 items, then I believe the distribution would be something like:

[1, 2]
[3, 4]
[5, 6]
[7, 8]
[9]
[10]
[11]
[12]

In which case the last 4 threads are doing half the work of the first 4, so you will have all 8 threads running to start, then only 4 when the others complete their chunk. The difference can be increased if processing the items in 10 and 11 are quick then those could drop out real fast.

mbauman · May 29, 2020, 3:33pm

This is in the process of changing; it used to be that the inner nested @threads would only schedule work on threads if on thread 1 (that’s version 1.4). The outer loop splits into two threads — and only one of those two will see multithreading inside it. So the “three threads” portion is simply the straggling 2 elements from the inner for loop on that first iteration. This is changing in 1.5 and will likely change again to fully participate in the depth-first queue. Version 1.5:

julia> @show Threads.nthreads()
Threads.nthreads() = 8
8

julia> t0 = now()
2020-05-29T10:28:55.45

julia> tmap(1:2) do i
           tmap(10:10:100) do j
               stamp(i,j)
               sleep(1)
           end
       end
167 milliseconds (2, 10)
167 milliseconds (1, 10)
1202 milliseconds (1, 20)
1202 milliseconds (2, 20)
2206 milliseconds (2, 30)
2206 milliseconds (1, 30)
3209 milliseconds (1, 40)
3209 milliseconds (2, 40)
4210 milliseconds (2, 50)
4210 milliseconds (1, 50)
5217 milliseconds (1, 60)
5217 milliseconds (2, 60)
6220 milliseconds (2, 70)
6220 milliseconds (1, 70)
7222 milliseconds (2, 80)
7222 milliseconds (1, 80)
8225 milliseconds (1, 90)
8225 milliseconds (2, 90)
9230 milliseconds (2, 100)
9230 milliseconds (1, 100)

https://docs.julialang.org/en/v1.6-dev/base/multi-threading/#Base.Threads.@threads

maarten_van_damme · May 29, 2020, 3:33pm

There is a discussion about this on the github issue tracker : add `@threads static` to aid thread API evolution by JeffBezanson · Pull Request #35646 · JuliaLang/julia · GitHub so the behaviour will change in future

jw3126 · May 29, 2020, 6:05pm

Thanks a lot your response explained very well whats going on!

Topic		Replies	Views
Multithreading for nested for loops General Usage parallel , multithreading , threads	13	1720	August 16, 2023
Simple multi-thread loop with array Performance question , parallel , multithreading	11	762	April 13, 2021
Threads.@threads scheduling puzzle Performance parallel	2	379	December 9, 2021
Is static scheduling available in Julia multi-threading? General Usage question	0	304	February 21, 2023
Multithreading @threads inside @spawn Performance multithreading	2	746	May 4, 2021

Understanding nested Threads.@threads scheduling

Related topics