I’d like to make sure I understand how Julia hooks in to CUDA-aware MPI. Basically, is it true that when send/receiving Julia objects which may have a CuArray somewhere in them, that the CuArray is never moved to the CPU, and instead passed from GPU to GPU?
Here’s an example script:
using MPI
MPI.Init()
using CuArrays
using CUDAdrv
using CUDAnative
comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
device!(rank)
@info "MPI process $rank is using $(device())"
if rank == 0
dat = (x = cu(ones(4,4)),)
MPI.send(dat, 1, 0, comm)
else
dat, = MPI.recv(0, 0, comm)
@show dat
end
When I mpiexec -n 2
this script on a machine with 2 GPUs, does the CuArray inside dat
ever have to go through CPU? Is there a way to check? Thanks.