How to combine MPI non-blocking Isend/Irecv and Julia tasks

fverdugo · December 28, 2020, 5:54pm

I would like to hide the MPI calls needed to wait for a non-blocking Isend / Irecv! by using the Julia asynchronous programming capabilities. I can imagine 3 main ways of doing this (see code below), but I am not certain which one is the way to go.

Which one is the best? Are there other better alternatives? Any help would be very welcome!

using MPI
MPI.Init()
comm = MPI.COMM_WORLD

# Start a non-blocking exchange
N = 4
send_mesg = Array{Float64}(undef, N)
recv_mesg = Array{Float64}(undef, N)
rank = MPI.Comm_rank(comm)
size = MPI.Comm_size(comm)
dst = mod(rank+1, size)
src = mod(rank-1, size)
fill!(send_mesg, Float64(rank))
rreq = MPI.Irecv!(recv_mesg, src,  src+32, comm)
print("$rank: Sending   $rank -> $dst = $send_mesg\n")
sreq = MPI.Isend(send_mesg, dst, rank+32, comm)

# Option 1
# Will the underlying c call to MPI_Waitall block the entire julia process?
# Will code have a change to run between @async and wait before the task finishes?)
t = @async begin
  stats = MPI.Waitall!([rreq, sreq])
  print("$rank: Received $src -> $rank = $recv_mesg\n")
end
# run some code here before the communication is done
wait(t)

# Option 2
# is the while loop efficient? will it provide room for other tasks to run?
t = @async begin
  done = false
  while !done
   done, stats = MPI.Testall!([rreq, sreq])
  end
  print("$rank: Received $src -> $rank = $recv_mesg\n")
end
# run some code here before the communication is done
wait(t)

# Option 3
# I would say that in this case other code can run between @task and schedule
# but I would like to avoid to explicitly call schedule)
t = @task begin
  stats = MPI.Waitall!([rreq, sreq])
  print("$rank: Received $src -> $rank = $recv_mesg\n")
end
# run some code here before the communication is done
wait(schedule(t))

stevengj · December 28, 2020, 6:49pm

It would be good to have a way to integrate MPI_Wait with the libuv event loop used by Julia tasks (green threads), so that Julia tasks can execute while MPI is waiting and so that the waiting task can wake up as soon as the request is available.

Using MPI.Waitall (options 1 or 3) won’t work, because that call does not return until a request completes. Using a spinloop (option 2) should work as long as you call yield() in the loop, but is pretty inefficient — it consumes a lot of cycles testing over and over again.

The ideal thing would be if an MPI_Request could be converted to a file descriptor, since libuv can wait efficiently on file descriptors via poll, but I don’t see a standard way to do this (even though some MPI implementations may use file descriptors internally for asynchronous requests).

However, MPI_Wait is thread-safe, along with MPI_Waitall, so that offers another option. You could spawn a (real) thread to wait on MPI, and when the waiting succeeds you could use uv_async_send to notify the main libuv (Julia) event loop. See also here and here.

It would be nice if something like this were implemented in MPI.jl, since it’s rather low-level stuff that most users wouldn’t want to muck with directly, though now that Julia supports real threading it should be easier.

stevengj · December 28, 2020, 7:08pm

MPI also has something called a generalized request that allow you do define custom mechanisms for asynchronous operation. I haven’t read the documentation closely yet, but this might ultimately be the best way to integrate MPI requests with Julia.

fverdugo · December 28, 2020, 7:37pm

Thanks for your answers @stevengj!

And what about using @threadcall ? Multi-Threading · The Julia Language

stevengj · December 28, 2020, 8:11pm

@threadcall on MPI_Wait or MPI_Waitall sounds like a good choice here, too — I didn’t actually know about that macro! Since it only works for ccall, you’ll have to re-implement MPI.Waitall!, or submit a patch to MPI.jl — it might be reasonable to have keyword argument to do this.

vchuravy · January 15, 2021, 8:59pm

In CLIMA we solve this as https://github.com/CliMA/ClimateMachine.jl/blob/65a7e65cda475ceeac0d10ea6894f49e382db822/src/Arrays/MPIStateArrays.jl#L403-L507 with special consideration for KernelAbstractions and GPU computation.

simonbyrne · February 24, 2021, 11:33pm

I’ve added a PR for a @threadcall-based wait here: RFC: Define wait(req) to use threadcall by simonbyrne · Pull Request #452 · JuliaParallel/MPI.jl · GitHub. Thoughts/comments appreciated.

If you want to use a Julia thread, you can simply use

t = Threads.@spawn MPI.Waitall!([rreq, sreq])

(though you will need to start Julia with multiple threads, and use MPI.Init_thread)

luraess · August 16, 2024, 8:10pm

In Chmy.jl, we recently combined task-based approach with MPI and async GPU operation (relying on TLS) with the aim to hide MPI communication behind stencil computation. The approach was successful to allow scaling 3D thermo-mechanical Stokes flow on the entire LUMI supercomputer. It combines long running tasks (workers initialised by a launcher) on which task-local exchangers and stack allocator provide the work to achieve the halo exchange where cooperative waiting is needed.

Topic		Replies	Views
Julia 0.6 and MPI wrapper General Usage	3	914	October 20, 2017
Mixing blocking MPI communication and Julia concurrency General Usage	0	165	June 16, 2023
Can't get right ranks when using non-blocking MPI.Irecv!() New to Julia mpi	5	619	September 15, 2020
Background processes in julia General Usage	3	1622	September 16, 2018
Calling an external mpi executable from julia General Usage mpi	5	921	March 19, 2018

How to combine MPI non-blocking Isend/Irecv and Julia tasks

Related topics