Multithreading for recursive operations

Thanks for that explanation and link! This works splendidly and gives a nice API. The docs appear to be quite out-of-sync with master, though (I stopped reading at “Note
Currently, all tasks in Julia are executed in a single OS thread co-operatively.”).

However, the scheduling is quite different from what I imagined: As advertised, partr appears to attempt to run the innermost parallelization opportunities concurrently. With two threads and an expensive f():

julia> const t1=Ref(1); const t2=Ref(1);
julia> function _parmap(f, A, lo=1, hi=length(A)+1)
       if hi==lo+1
       f()
       if Threads.threadid()==1
       A[lo] = t1[]
       t1[] += 1
       else
       A[lo] = -t2[]
       t2[] += 1
       end
       else
       m = lo+ (hi-lo)>>1
       t = @par _parmap(f, A, m, hi)
       _parmap(f, A, lo, m)
       wait(t)
       end
       nothing
       end
julia> t1[]=1; t2[]=1; A=zeros(Int,10); _parmap(f,A); A
10-element Array{Int64,1}:
  1
 -3
 -1
  2
 -2
  3
 -5
  4
 -4
  5

What I originally imagined would have been deterministic (in the limit of very expensive f) A==[1, 2, 3, 4, 5, -1, -2, -3, -4, -5].

I guess I’ll have to read some more on the scheduler and learn to deal with it. (I care about the schedule because I have some reasonable guesses about which tasks can run concurrently without issues, and a more expensive conflict resolution if these guesses proved wrong.)

With regards to overhead, it appears that 100 us is an OK size of tasks and 30 us is too small (on my machine).