Maybe I misinterpreted a point here. You can tune the load balancing in this pattern by setting the number of tasks (or the task size). That is actually a good way to control how the parallel code runs, depending on the problem. And you can (with current Julia) emulate the future behavior of @threads
using @spawn
.
For example, here I create a function which may have a bad load balancing, because each iteration sleeps for a fraction of a second:
julia> function test(lapses,ntasks)
s = zeros(ntasks)
Threads.@sync for it in 1:ntasks
Threads.@spawn for i in it:ntasks:length(lapses)
sleep(lapses[i])
s[it] += lapses[i]
end
end
sum(s)
end
test (generic function with 1 method)
julia> lapses = [ 1e-3*i for j in 1:50 for i in 1:4 ];
julia> sum(lapses) # total sleep time
0.5000000000000001
julia> @btime test($lapses,1) # one task
756.380 ms (1027 allocations: 31.58 KiB)
0.5000000000000003
julia> @btime test($lapses,4) # ntasks == nthreads
258.188 ms (1046 allocations: 33.19 KiB)
0.5000000000000003
julia> @btime test($lapses,20) # ntasks >> nthreads
50.320 ms (1141 allocations: 42.12 KiB)
0.5
Thus, if the tasks are very heterogeneous, you can improve balancing by controlling the number of tasks. Ideally, the code can setup some optimal task size depending on the input given, taking into consideration the problem characteristics, avoiding both excessive or insufficient task spawning, both which can be detrimental for performance.
Anyway, this may be offtopic, but who knows.