Julia’s default multi-processing system, Distributed, is based on a different paradigm than the de facto standard MPI in HPC.
I’m really wondering the numbers of people using the two different types of distributed computation. If you use both of them frequently, please select both.
I also appreciate if people can share their experience with these 2 different architectures.
I’m afraid I don’t have any up-to-date numbers to support my claim. I did some testing two or three years ago, and it was no contest: MPI won handily. However, YMMV. It depends on the granularity of the computation. Little communication and lots of computation may well favor Distributed.
I am not familiar with DistributedNext. I’ll be curious to see what’s there.
I use both when developing Dagger.jl, because users might have a reason for using one or the other. Some users want the dynamism and flexibility of Distributed, while others want the raw performance and scalability of MPI. Some users want both, and use MPIClusterManager, and are thus using both at the same time!
For reference, in Dagger, we provide high-level abstractions (like our DArray, and our Datadeps parallel algorithm framework) that don’t specify one or the other, but allow users to tell Dagger which to use, while providing equivalent semantics regardless of which option is chosen. This means the problem “Does library XYZ support Distributed or MPI?” doesn’t really matter when the library supports Dagger; the difference is handled behind the scenes, and no code has to be rewritten.
(Also, we allow selecting between Distributed and DistributedNext, so regardless of which Distributed-compatible library users choose, everything “just works”)
I’d definitely say so! MPI makes you deal with details like parallel synchronization and coordination, data transfer/serialization, managing concurrency, etc. - all complicated things that even HPC experts struggle with.
Dagger (Datadeps specifically) takes the path of letting you express your program as a serial sequence of operations on a partitioned array, which Dagger then parallelizes for you based on little data dependency annotations. Dagger then handles those previously-mentioned complexities for you, efficiently utilizing MPI for data transfers and automatically scheduling your program across ranks and managing concurrency and program ordering, so that you still get the performance you expect from a well-written MPI program, without the mental overhead of having to do the MPI calls (and everything else) yourself.
I used Distributed largely because most of my work is embarrassingly parallel (thus the advantage of MPI communication being moot) and my collaborators are not great with programming, and Distributed is basically a better version of R’s foreach and doParallel packages.
I’ve never done raw MPI in Julia, but I use MPIManager a fair bit to set up the workers via MPI then use Distributed-type calls (plus this gets you faster inter-communication and benefits of CUDA-aware MPI for GPU work if the cluster is set up right)
Performance aside, Distributed and MPI follow quite different paradigms. Some applications might benefit more from one than the other.
For instance, using Distributed is straightforward to implement a replicated workers model, while with MPI is easier to use many common communication patterns in scientific computing as they are already implemented for you (and with good hardware support).
I would say that virtually any distributed algorithm can be implemented using any of both approaches. But one can be more practical than the other depending on your particular use case.
There is a paper from 2021 by Amal Rizvi and Kyle C. Hale titled “A Look at Communication-Intensive Performance in Julia”. It does not fully compare Distributed and MPI, but you might find it interesting. There is also a discussion on a similar topic from 2021 here.