Recursive multithreading

I want to parallelize an algorithm on trees. This should be pretty easy in principle: At each step, I can split into two subproblems that can be handled independently (inplace). Unfortunately I always get only two threads used. Minimal example:

julia> Threads.nthreads()
4
julia> target = collect(1:5);
julia> function mtfor(target)
       Threads.@threads for i=1:length(target)
       target[i]=Threads.threadid()
       end 
       nothing
       end
julia> mtfor(target); target
5-element Array{Int64,1}:
 1
 1
 2
 3
 4

So that works nice. However,

julia> function recmt(target)
       (length(target)<1) && return nothing
       target[1]=Threads.threadid()
       randn(10_000_000); #something slow
       length(target)==1 && return nothing
       Threads.@threads for i=1:2
       (i==1) && recmt(view(target, 2:( length(target)>>1)))
       (i==2) && recmt(view(target, (1+ length(target)>>1):length(target)))
       end
       nothing
       end

julia> recmt(target);target
5-element Array{Int64,1}:
 1
 1
 2
 2
 2

Naively, I would have hoped that julia has a global pool of worker threads, and each @threads generates new tasks that get consumed by the workers (and the onus is on me to choose appropriate thresholds for splitting). So what am I doing wrong here?

No. Not yet.

So the right thing to do is set up a channel and worker threads, and push work packages into the channel?

Are you aware of any packages that do the boilerplate for this, in the case of “almost embarassingly parallel” problems, like e.g. quicksort?