ANN: MPI.jl v0.10.0: new build process and CUDA-aware support

I have just tagged a new version of MPI.jl. Though the user-facing interface is largely the same, there has been extensive work underneath to internally use the C API (instead of the Fortran one). As a result, the build process is much simpler (it no longer requires CMake or a Fortran compiler). Additionally, it also directly supports CUDA-aware MPI libraries, allowing CuArrays to be passed directly as buffers (thanks to Seyoon Ko).

I would greatly appreciate if people are able to try it out, especially with different clusters and MPI implementations.

24 Likes

Is it worth sharing this on the OpenMPI mailing list? I would guess you work closely with those guys anyway.

I just noticed this is in Julia at Scale!
Picture me doing a happy dance. Or maybe my Miata doing doughnuts (see avatar image).

1 Like

Thank you @simonbyrne! Here are some code examples using the new version of MPI.jl.

3 Likes

Basic send Send, Recv! works fine on an ordinary Linux cluster with OpenMPI. Thanks!

1 Like

Nice! Is there a way to partition a big CuArray into p parts like the slab decomposition?

I’m working on it based on barche/MPIArrays.jl. I think it can be released some time this year.

1 Like

You may be interested in the python lib.

1 Like

@rveltz This looks interesting. Thanks for the reference!

I have access to a DGX-1 GPU system. I’m not sure how much time I can get on it.
If there are any tests which could be run for this package I could give it a try.

Do you know if it has a CUDA-aware MPI? If so, would be good to run the test suite with JULIA_PROJECT set to [pkgdir]/test/cudaenv.

@kose-y and @simonbyrne, a big thanks for making CUDA-aware MPI available to the Julia community! Unfortunately, when I tried CUDA-aware MPI with MPI.jl on two different systems, it failed in both cases. Could you have a look at this post where I reported the errors? It would be fantastic if I got it to work before the AGU conference this weekend… :slight_smile:

Hi,

Is there any news about Distributed Arrays with MPI, or mixing CuArrays with MPI?

You will need to build and link against a CUDA-aware MPI implementation, but other than that, CuArrays should work with MPI.jl:
https://juliaparallel.github.io/MPI.jl/stable/usage/#CUDA-aware-MPI-support-1

There is a proof-of-concept package of distributed arrays:


but other than that, no. It really depends on what sort of functionality you will want, e.g. we’ve built our own to provide support for ghost elements (https://github.com/CliMA/ClimateMachine.jl/blob/master/src/Arrays/MPIStateArrays.jl), but it makes a lot of assumptions about data layout, etc.

I see. Ideally, I want to perform ffts on multiGPU…

There is some discussion here: https://github.com/jipolanco/PencilFFTs.jl/issues/3

yes! That is exactly what I am looking for

As mentioned in the issue, it would be great to add GPU support to PencilFFTs. The MPI-heavy part of the code was recently refactored, and hopefully, extending things to work with CuArrays should not take too much work.

I personally don’t have any experience with CUDA-aware MPI, and I don’t have access to multi-GPU systems (that I’m aware of), so any help with this is most welcome!