Parallel for triangular domain

Apologies for leading with the wall of text, just feeling chuffed at the moment

Background: “all v.s. all” comparing elements in a set against all other elements in the set and computing a “distance” between each pair comes up at a significant scale in bioinformatics.

needless to say julia finishes in a fraction of the time the incumbent implementation took
and that was before giving it multiple cores.

calling with -p n and decorating with @parallel gives it the expected boost no problem.

My topic here is;
I know the underlying domain the for loop is iterating over is not rectangular
@parallel does not.

A standard response may be just put the loop body in a function and @spawn it with chosen intervals. but that looses the ease and charm of @parallel not to mention now being personally responsible moving the data around.

just as we have arbitrary (reducer) it might be lovely to accept arbitrary interval generators, but that is beyond scratching my immediate itch which is Lower Triangular matrices.

I wanted the be able to say @parallelLT and be done with it.

And the beauty of julia is that I can do exactly that.

Pulled from GH found where @parallel is defined in base/distributed/macros.jl
and make my own new version.

BOOM!


julia -p4
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.2484 (2017-11-09 21:25 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 26eb512* (0 days old master)
|__/                   |  x86_64-linux-gnu


using SharedArrays

include("/bla/bla/macros_TEC_minimal.jl")
@parallelLT (macro with 1 method)



N=50000;A=SharedArray{Int8}(((N*N)>>1)+N); 
@time @sync @parallel for i in 1:N for j in i:N A[(j*N+i)>>1]=myid() end end

211.409553 seconds (95.24 k allocations: 4.826 MiB)


N=50000;A=SharedArray{Int8}(((N*N)>>1)+N); 
@time @sync @parallelLT for i in 1:N for j in i:N A[(j*N+i)>>1]=myid() end end

137.210585 seconds (846.53 k allocations: 43.710 MiB, 0.01% gc time)


Not yet ready for prime time. I can’t even run the the program, I wrote in julia 0.5
a few weeks ago on this just built 0.7, but the promise is there.

If this is the sort of thing people think would be useful to have in Julia
I will work towards making it a pull request.

Also really curious about what people think about slotting in a interval generator,