I want to parallelize an algorithm on trees. This should be pretty easy in principle: At each step, I can split into two subproblems that can be handled independently (inplace). Unfortunately I always get only two threads used. Minimal example:
julia> Threads.nthreads()
4
julia> target = collect(1:5);
julia> function mtfor(target)
Threads.@threads for i=1:length(target)
target[i]=Threads.threadid()
end
nothing
end
julia> mtfor(target); target
5-element Array{Int64,1}:
1
1
2
3
4
So that works nice. However,
julia> function recmt(target)
(length(target)<1) && return nothing
target[1]=Threads.threadid()
randn(10_000_000); #something slow
length(target)==1 && return nothing
Threads.@threads for i=1:2
(i==1) && recmt(view(target, 2:( length(target)>>1)))
(i==2) && recmt(view(target, (1+ length(target)>>1):length(target)))
end
nothing
end
julia> recmt(target);target
5-element Array{Int64,1}:
1
1
2
2
2
Naively, I would have hoped that julia has a global pool of worker threads, and each @threads
generates new tasks that get consumed by the workers (and the onus is on me to choose appropriate thresholds for splitting). So what am I doing wrong here?