yes, I see you are right!
using .Threads, BenchmarkTools
# a silly function taking some time, returning its thread
f(n) = (sum((i for i in 1:n) .^ 2); threadid())
function show_load(threads)
res = fill("", nthreads())
foreach(i->res[i]*="*", threads)
res
end
then
julia> @btime f(2_000)
1.174 μs (2 allocations: 31.50 KiB)
1
julia> fetch.(map((_->Threads.@spawn f(2000)), 1:nthreads()))
8-element Array{Int64,1}:
3
2
4
5
6
7
8
1
If we put the same load on all tasks, all threads are employed. Even with unbalanced load (if M >> N | M: number of tasks, N: nthreads) the balance is quite good:
julia> show_load(fetch.(map(_->(Threads.@spawn f(rand(1:2000))), 1:500)))
8-element Array{String,1}:
"****************************************************"
"****************************************************************************************"
"**************************************************************************"
"*********************************"
"********************************************************"
"***********************************************************"
"***********************************************************************"
"*******************************************************************"
I first had an other impression because in my applications/tasks usually I read first from a channel. In that case the load is very imbalanced:
g(n) = (yield(); f(n))
julia> show_load(fetch.(map(_->(Threads.@spawn g(rand(1:2000))), 1:500)))
8-element Array{String,1}:
"*"
"*************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************"
"*"
"*"
"*"
"*"
"*"
"*"