ANN: MPI.jl v0.10.0: new build process and CUDA-aware support

I have just tagged a new version of MPI.jl. Though the user-facing interface is largely the same, there has been extensive work underneath to internally use the C API (instead of the Fortran one). As a result, the build process is much simpler (it no longer requires CMake or a Fortran compiler). Additionally, it also directly supports CUDA-aware MPI libraries, allowing CuArrays to be passed directly as buffers (thanks to Seyoon Ko).

I would greatly appreciate if people are able to try it out, especially with different clusters and MPI implementations.


Is it worth sharing this on the OpenMPI mailing list? I would guess you work closely with those guys anyway.

Thank you @simonbyrne! Here are some code examples using the new version of MPI.jl.


Basic send Send, Recv! works fine on an ordinary Linux cluster with OpenMPI. Thanks!

Nice! Is there a way to partition a big CuArray into p parts like the slab decomposition?

I’m working on it based on barche/MPIArrays.jl. I think it can be released some time this year.

You may be interested in the python lib.

@rveltz This looks interesting. Thanks for the reference!

I have access to a DGX-1 GPU system. I’m not sure how much time I can get on it.
If there are any tests which could be run for this package I could give it a try.

Do you know if it has a CUDA-aware MPI? If so, would be good to run the test suite with JULIA_PROJECT set to [pkgdir]/test/cudaenv.

@kose-y and @simonbyrne, a big thanks for making CUDA-aware MPI available to the Julia community! Unfortunately, when I tried CUDA-aware MPI with MPI.jl on two different systems, it failed in both cases. Could you have a look at this post where I reported the errors? It would be fantastic if I got it to work before the AGU conference this weekend… :slight_smile:


Is there any news about Distributed Arrays with MPI, or mixing CuArrays with MPI?

You will need to build and link against a CUDA-aware MPI implementation, but other than that, CuArrays should work with MPI.jl:

There is a proof-of-concept package of distributed arrays:

but other than that, no. It really depends on what sort of functionality you will want, e.g. we’ve built our own to provide support for ghost elements (, but it makes a lot of assumptions about data layout, etc.

I see. Ideally, I want to perform ffts on multiGPU…

There is some discussion here:

yes! That is exactly what I am looking for

As mentioned in the issue, it would be great to add GPU support to PencilFFTs. The MPI-heavy part of the code was recently refactored, and hopefully, extending things to work with CuArrays should not take too much work.

I personally don’t have any experience with CUDA-aware MPI, and I don’t have access to multi-GPU systems (that I’m aware of), so any help with this is most welcome!