I’d like to make sure I understand how Julia hooks in to CUDA-aware MPI. Basically, is it true that when send/receiving Julia objects which may have a CuArray somewhere in them, that the CuArray is never moved to the CPU, and instead passed from GPU to GPU?
Here’s an example script:
using MPI MPI.Init() using CuArrays using CUDAdrv using CUDAnative comm = MPI.COMM_WORLD rank = MPI.Comm_rank(comm) device!(rank) @info "MPI process $rank is using $(device())" if rank == 0 dat = (x = cu(ones(4,4)),) MPI.send(dat, 1, 0, comm) else dat, = MPI.recv(0, 0, comm) @show dat end
mpiexec -n 2 this script on a machine with 2 GPUs, does the CuArray inside
dat ever have to go through CPU? Is there a way to check? Thanks.