How to combine MPI non-blocking Isend/Irecv and Julia tasks

I would like to hide the MPI calls needed to wait for a non-blocking Isend / Irecv! by using the Julia asynchronous programming capabilities. I can imagine 3 main ways of doing this (see code below), but I am not certain which one is the way to go.

Which one is the best? Are there other better alternatives? Any help would be very welcome!

using MPI
MPI.Init()
comm = MPI.COMM_WORLD

# Start a non-blocking exchange
N = 4
send_mesg = Array{Float64}(undef, N)
recv_mesg = Array{Float64}(undef, N)
rank = MPI.Comm_rank(comm)
size = MPI.Comm_size(comm)
dst = mod(rank+1, size)
src = mod(rank-1, size)
fill!(send_mesg, Float64(rank))
rreq = MPI.Irecv!(recv_mesg, src,  src+32, comm)
print("$rank: Sending   $rank -> $dst = $send_mesg\n")
sreq = MPI.Isend(send_mesg, dst, rank+32, comm)

# Option 1
# Will the underlying c call to MPI_Waitall block the entire julia process?
# Will code have a change to run between @async and wait before the task finishes?)
t = @async begin
  stats = MPI.Waitall!([rreq, sreq])
  print("$rank: Received $src -> $rank = $recv_mesg\n")
end
# run some code here before the communication is done
wait(t)

# Option 2
# is the while loop efficient? will it provide room for other tasks to run?
t = @async begin
  done = false
  while !done
   done, stats = MPI.Testall!([rreq, sreq])
  end
  print("$rank: Received $src -> $rank = $recv_mesg\n")
end
# run some code here before the communication is done
wait(t)

# Option 3
# I would say that in this case other code can run between @task and schedule
# but I would like to avoid to explicitly call schedule)
t = @task begin
  stats = MPI.Waitall!([rreq, sreq])
  print("$rank: Received $src -> $rank = $recv_mesg\n")
end
# run some code here before the communication is done
wait(schedule(t))

1 Like

It would be good to have a way to integrate MPI_Wait with the libuv event loop used by Julia tasks (green threads), so that Julia tasks can execute while MPI is waiting and so that the waiting task can wake up as soon as the request is available.

Using MPI.Waitall (options 1 or 3) won’t work, because that call does not return until a request completes. Using a spinloop (option 2) should work as long as you call yield() in the loop, but is pretty inefficient — it consumes a lot of cycles testing over and over again.

The ideal thing would be if an MPI_Request could be converted to a file descriptor, since libuv can wait efficiently on file descriptors via poll, but I don’t see a standard way to do this (even though some MPI implementations may use file descriptors internally for asynchronous requests).

However, MPI_Wait is thread-safe, along with MPI_Waitall, so that offers another option. You could spawn a (real) thread to wait on MPI, and when the waiting succeeds you could use uv_async_send to notify the main libuv (Julia) event loop. See also here and here.

It would be nice if something like this were implemented in MPI.jl, since it’s rather low-level stuff that most users wouldn’t want to muck with directly, though now that Julia supports real threading it should be easier.

2 Likes

MPI also has something called a generalized request that allow you do define custom mechanisms for asynchronous operation. I haven’t read the documentation closely yet, but this might ultimately be the best way to integrate MPI requests with Julia.

1 Like

Thanks for your answers @stevengj!

And what about using @threadcall ? Multi-Threading · The Julia Language

@threadcall on MPI_Wait or MPI_Waitall sounds like a good choice here, too — I didn’t actually know about that macro! Since it only works for ccall, you’ll have to re-implement MPI.Waitall!, or submit a patch to MPI.jl — it might be reasonable to have keyword argument to do this.

2 Likes

In CLIMA we solve this as https://github.com/CliMA/ClimateMachine.jl/blob/65a7e65cda475ceeac0d10ea6894f49e382db822/src/Arrays/MPIStateArrays.jl#L403-L507 with special consideration for KernelAbstractions and GPU computation.

3 Likes

I’ve added a PR for a @threadcall-based wait here: RFC: Define wait(req) to use threadcall by simonbyrne · Pull Request #452 · JuliaParallel/MPI.jl · GitHub. Thoughts/comments appreciated.

If you want to use a Julia thread, you can simply use

t = Threads.@spawn MPI.Waitall!([rreq, sreq])

(though you will need to start Julia with multiple threads, and use MPI.Init_thread)

3 Likes