CUDA aware MPI fails but runs on multiple GPUs

francispoulin · July 21, 2021, 6:57pm

I am running the library ImplicitGlobalGrid.jl on a server to learn about multi GPU computing. The server has MPI and CUDA but when I try MPI.has_cuda() I get false, which tells me that the MPI is not CUDA aware.

However, when I run one of the GPU examples in the repo above, and I specify that I want it to run on 2 gpus using salloc, it runs. You can see the results below (with a coarse grid since I wanted it to run fast).

If a system does not have CUDA-aware MPI then can it still run on multiple GPUs? I was actually hoping to get an error to see what was wrong, but no error occurred.

$ mpiexec -np 2 julia --project diffusion3D_multigpu_CuArrays_
diffusion3D_multigpu_CuArrays_novis.jl    diffusion3D_multigpu_CuArrays_onlyvis.jl  
[fpoulin@cdr353 examples]$ mpiexec -np 2 julia --project diffusion3D_multigpu_CuArrays_novis.jl 
┌ Warning: The NVIDIA driver on this system only supports up to CUDA 11.1.0.
│ For performance reasons, it is recommended to upgrade to a driver that supports CUDA 11.2 or higher.
└ @ CUDA ~/.julia/packages/CUDA/lwSps/src/initialization.jl:42
┌ Warning: The NVIDIA driver on this system only supports up to CUDA 11.1.0.
│ For performance reasons, it is recommended to upgrade to a driver that supports CUDA 11.2 or higher.
└ @ CUDA ~/.julia/packages/CUDA/lwSps/src/initialization.jl:42
Global grid: 30x16x16 (nprocs: 2, dims: 2x1x1)

francispoulin · July 21, 2021, 8:38pm

I have learned maybe an obvious things that even if a system does have CUDA aware MPI it can still run on multiple GPUs, however, the efficiency will in general not be as goodl. Sorry for the bother.

simonbyrne · July 21, 2021, 8:50pm

CUDA-aware MPI just determines whether or not you can use CuArrays directly as MPI communication buffers: if your MPI is not CUDA-aware, you will have to first copy the contents to an Array and use that as the buffer.

luraess · July 21, 2021, 8:53pm

Hi @francispoulin, thank you for your enthusiastic feedback about ImplicitGlobalGrid.jl. As you figured out, CUDA-aware MPI is an additional feature that permits to exchange directly GPU array pointers via MPI (bypassing explicit buffer copying to the host prior to exchanging host arrays with MPI). In ImplicitGlobalGrid non CUDA-aware MPI is the default implementation where special care was taken to optimise pipelining for optimal performance. Every MPI process controls one GPU and so you can run multi-GPU application scaling on supercomputers even without CUDA-aware capabilites.

If the MPI build on your cluster supports CUDA-awareness in the future, then exporting IGG_CUDAAWARE_MPI=1 will enable it.

luraess · July 21, 2021, 8:59pm

Note that some of the ImplicitGlobalGrid related capabilities, implementations and synergies with ParallelStencil.jl will be discussed in the JuliaCon workshop on Solving differential equations in parallel on GPUs on Friday July 23, 2021. Tune in if curious

francispoulin · July 21, 2021, 9:32pm

Thank you @simonbyrne and @luraess for your helpful feedback.

I will most certainly checkout the Solving DEs in Parallel on GPUs on Friday!

Topic		Replies	Views
CUDA aware MPI works on system but not for Julia Julia at Scale parallel , mpi	30	2980	January 24, 2022
ParallelStencil + ImplicitGlobalGrid with multiple GPUs Julia at Scale	8	872	March 31, 2023
ANN: MPI.jl v0.10.0: new build process and CUDA-aware support Julia at Scale	26	2735	October 31, 2020
Question about CUDA-aware MPI GPU	1	860	April 17, 2020
Error/segfault in basic test of CUDA-aware MPI Julia at Scale question	10	1418	November 6, 2020

CUDA aware MPI fails but runs on multiple GPUs

Related topics