I think it’s less the communication mechanism and more the programming model. Julia’s Distributed lets you use the faster communication as well if you ask it to. But Distributed’s programming model is so “top-down” that it’s really tailored to mapreduce types of calls, which is great for simple data-parallel computing, but MPI is just easier to do the “minimal communication + local compute” that people want for big PDEs. You could in theory do this same style of compute by making Julia’s Distributed turn on the use of the interconnects (something Valentin made work), and then spawn one process per node that then encodes all of the communication into it (i.e. spawn process once that then last the entire time, and in these processes do the interchannel communications), and you’d essentially have MPI. But… that would be going so far out of your way to write MPI-like code in Distributed that you might as well use MPI.
Note that the Julia Lab is having weekly parallel compute programming model discussions to try and find interesting new answers in this space.
Ehh I don’t see OneAPI as it. I think the problem of heterogeneous compute is different and needs a different solution… long story though.
In the demarcations that I mention in this 18.337 lecture:
OneAPI is an instantiation of a declarative heterogeneous SPMD kernel generation model. Those kinds of things are very powerful for if you have very regular compute problems (i.e. lu-factorization or a PDE stencil), but are limited in the codes that they can express well. It’s a path that should be pushed forward and will be useful for writing kernels, but I don’t think it’s the future of democratized HPC.