Distributed.jl, DistributedArray.jl with InfiniBand cluster

This may not be the most optimal workflow. In particular, Distributed doesn’t offer a tree-based reduce, which might make mapreduce operations (ie. distributed for loops) seriously slow compared to MPI. It’s really hard to compare with MPI if one needs to transfer large data across cores.