MPI Jupyter kernel & cluster in Julia (similar to dask-mpi / ipyparallel)?

marius311 · August 28, 2021, 6:34pm

I’m on a cluster where MPI transport is significantly faster than TCP transport, and I’d like to keep my interactive Jupyter-based workflow for parallel work. A key thing is I don’t want to be doing any intercommunication “by hand”, I just want to use normal Julia Distributed constructs like @everywhere, pmap, etc… and have those be transfering objects for me (via MPI transport).

MPIClusterManagers.jl has MPIManager which can set up something like:

1)          MPI   
         __________
       / | worker |
kernel - | worker |
       \ | worker |
         ----------

but the kernel is not part of the MPI pool so distributing work from the kernel happens via slow TCP.

Something better (dask-mpi and ipyparallel have something like this I believe):

2)                MPI   
         _______________________
         |            / worker |
kernel - | controller - worker |
         |            \ worker |
         -----------------------

where theres one slow TCP send to the controller but then data is scatterred via fast MPI by the controller to the workers. Is there anything like this in Julia?

Finally, maybe the most ideal thing (but maybe hardest) to me seems like if the Jupyter kernel could just be part of the MPI pool, like:

3)       MPI   
_____________________
|        /   worker |
| kernel -   worker |
|        \   worker |
---------------------

I actually hacked together something like this which works by making a custom kernel.json file, but its pretty brittle and hangs when the kernel is shutdown / restarted and not super usable. Is anyone aware of something like this done better by someone?

Thanks.

simonbyrne · August 30, 2021, 4:30am

Unfortunately, there isn’t a great way to do this, though option 3 sounds like the simplest way forward. This does sound like a good item for the JuliaHPC call if you would to join.

marius311 · August 30, 2021, 5:46am

Thanks for the tip, will try to join!

jblaschke · August 30, 2021, 11:28pm

Option 3 should also be doable at NERSC, we allow for Jupyter jobs where the kernel runs in an srun job. I can talk with some of the folks that have set it up.

simonbyrne · August 31, 2021, 7:03pm

There is a function in the IJulia.jl build script for adding custom Jupyter kernels:
https://github.com/JuliaLang/IJulia.jl/blob/master/deps/kspec.jl
It might be possible to use or adapt that.

marius311 · August 31, 2021, 7:58pm

Thanks (sorry I couldn’t join today in the end). For definiteness, here’s my suboptimal solution for (3):

# mpi_kernel.jl
using Distributed, MPI, MPIClusterManagers
MPI.Init()
manager = MPIClusterManagers.start_main_loop(
    MPI_TRANSPORT_ALL,
    stdout_to_master=true,
    stderr_to_master=true
)
include("/global/homes/m/marius/.julia/packages/IJulia/e8kqU/src/kernel.jl")

and

# kernel.json
{
  "display_name": "Julia 1.6.1 (MPI)",
  "argv": [
    "srun",
    "/global/u1/m/marius/src/julia-1.6.1/bin/julia",
    "-i",
    "--color=yes",
    "--project=@.",
    "/global/u1/m/marius/.local/share/jupyter/kernels/julia-1.6-mpi/mpi_kernel.jl",
    "{connection_file}"
  ],
  "language": "julia",
  "env": {},
  "interrupt_mode": "signal"
}

So basically the kernel srun’s the mpi_kernel.jl file, rank-0 becomes the actual kernel which calls the IJulia file that would have originally been called, and the other ranks connect back as workers. It works but the downsides are:

Its a different kernel so gets saved to the notebook file, so if you open the same notebook outside of an MPI environment it’ll basically crash.
There’s not any way to control how many workers you get, you just get the entire allocation that the job was submitted with. It would be nicer if you could somehow delay spawning the workers until you’re actually in the notebook and could choose how many you want.
It doesn’t shut down cleanly, so to restart it, you have to resubmit your entire batch job (or maybe do some killing, haven’t figured out exactly how; ths is probably the most annoying issue)

If these were solved, definitely I think you could use that IJulia code to automate making such kernels.

simonbyrne · September 1, 2021, 8:49pm

Are you running the Jupyter session from within an salloc? or does the srun do the allocation?

marius311 · September 1, 2021, 9:42pm

From within (via a JupyterHub instance managed by them).

marius311 · September 1, 2021, 9:47pm

I did a little of reading the other day and I think one way to fix my 2nd issue above is to only initially spawn the kernel with one MPI process, then provide a command to spawn additional workers which would call MPI_Comm_spawn, and then use MPI_Intercomm_merge to merge the newly created comm back into the main global one which (presumably) is being used by Julia to serialize objects to workers. I may have missed some hitch in this plan though.

simonbyrne · September 30, 2021, 4:51am

I think the best solution in this case would look similar to LLNL’s bridge kernel, which they explain a bit in this paper (it may even be possible to reuse some of those parts).

Topic		Replies	Views
Use MPI with iJulia Julia at Scale question	1	312	March 19, 2023
How to get started with distributed memory parallel programming? New to Julia	3	711	June 9, 2021
Julia on cluster, only MPI transport allowed General Usage parallel , distributed	7	1234	October 10, 2020
Minimal examples of how to use MPI in JupyterLab Julia at Scale question	3	629	April 7, 2020
Unable to use multiple threads with MPI using Atom IDE New to Julia parallel , mpi	3	1116	August 20, 2019

MPI Jupyter kernel & cluster in Julia (similar to dask-mpi / ipyparallel)?

Related topics