I’m on a cluster where MPI transport is significantly faster than TCP transport, and I’d like to keep my interactive Jupyter-based workflow for parallel work. A key thing is I don’t want to be doing any intercommunication “by hand”, I just want to use normal Julia Distributed constructs like
pmap, etc… and have those be transfering objects for me (via MPI transport).
MPIManager which can set up something like:
1) MPI __________ / | worker | kernel - | worker | \ | worker | ----------
but the kernel is not part of the MPI pool so distributing work from the kernel happens via slow TCP.
Something better (dask-mpi and ipyparallel have something like this I believe):
2) MPI _______________________ | / worker | kernel - | controller - worker | | \ worker | -----------------------
where theres one slow TCP send to the controller but then data is scatterred via fast MPI by the controller to the workers. Is there anything like this in Julia?
Finally, maybe the most ideal thing (but maybe hardest) to me seems like if the Jupyter kernel could just be part of the MPI pool, like:
3) MPI _____________________ | / worker | | kernel - worker | | \ worker | ---------------------
I actually hacked together something like this which works by making a custom kernel.json file, but its pretty brittle and hangs when the kernel is shutdown / restarted and not super usable. Is anyone aware of something like this done better by someone?