Distributed compute is such an important part of a scientific computing, I’m a bit worried that this part of the ecosystem is in such a dismal state.
If ClusterManagers is not a reliable option, how do people typically do distributed computation in Julia? Maybe there’s some package everybody is using these days that I am not aware of…
I tried to find a solution for this last year and gave up after a while and ended up using ray, which has gone pretty smoothly
I suspect you’ll have an easier time using python/ray to do all the cluster stuff and using subprocess.run to launch Julia processes, than you will trying to do everything in Julia
when in a cluster/distributed context, I try to stick to the basics and avoid any workflow that sounds even mildly complex
I launch everything from python, but use subprocess.run to call the workhorse (julia or c++). probably PythonCall would work but I haven’t tried it.
if I need an object back to pass back and forth, I’ll try to serialize to a file inside the subprocess (julia), then read it back when the subprocess returns to the ray (python) worker which can put it into the distributed object store. so the serialization format has to be something both languages can understand. in my case that’s parquet but there are probably better options
note my workflow is “embarrassingly parallel” and mostly linear, so ymmv
Thanks, I see. Yeah I definitely need to have fairly frequent communication all-to-all between workers so sadly this wouldn’t work for me. But I will still check out ray for launching… maybe there’s a super simple way to just have every worker communicate via files. (My stuff is not high throughput, it just needs to be fairly regular and asynchronous)
Thanks; do you mean people manually make calls via MPI.jl? I checked out MPIClusterManagers.jl but it also seems pretty broken…
It’s really weird to me that the distributed computing side of Julia is this deficient. This seems like one of the main things Julia would be really good at given the community’s focus on scientific computing… Am I missing something here? Is everybody is just batch submitting single-node Julia scripts and we among a small handful of people trying to do true multi-node stuff?
there is also https://github.com/ray-project/ray/pull/40098 and https://github.com/beacon-biosignals/Ray.jl which look exciting, though I have used neither
People that really care about distributed computing, especially involving more than just two or three nodes, use MPI. It’s the industry standard and well supported in Julia through MPI.jl. The (big) drawback is that you pretty much loose all interactivity.
As I would describe it, Distributed.jl is the attempt to enable the user to do convenient and interactive small scale distributed computing. It is a non standard approach in a less explored design space. People that have been creating and working on it have focused on pushing Julia on other fronts (e.g @vchuravy has brought us pkgimages). And attracting new people to work on it has been difficult because, among other things, it has lived in the main Julia repo. Hopefully we will see more activity when it’s separate package.
Thanks, this is very helpful! I will switch to MPI.jl then.
I think my mistake was assuming Distributed.jl being in the stdlib meant it was the standard Julia approach for anything involving distributed compute. (Whereas MPI.jl being a package meant it was an experiment.)
Argh, I guess this means you couldn’t do distributed compute from PythonCall.jl then, right? Since you have to launch Julia via mpiexecjl? Looks like I don’t really have any good options here… All I really want is to start up Julia and execute an operation over my slurm allocation. One year ago this seemed to work with ClusterManagers but now things seem to have broken
ClusterManagers.jl is distinct from Distributed.jl – the latter being a standard library and quite stable (it could benefit from a dedicated maintainer, but it’s stability is also a boon).
ClusterManagers.jl is an external package that provides integration with diverse schedulers and has shown itself to be challenging to maintain partly because of the diversity of schedulers and the inherent challenge of matching one code-pattern to many targets, as well as providing CI for all these different schedulers. I have come to the belief that ClusterManagers.jl should have ignored the question of requesting resources and instead should have focused on creating a Distributed.jl cluster within an allocation.
MPIClusterManagers.jl was spun out of MPI.jl and mostly ignores the question of requesting resources and using MPI within an allocation to create a Distributed.jl cluster.
Additional maintainers or competing packages to ClusterManagers would be welcome, and I particularly believe that Dagger.jl (cc: @jpsamaroo) needs support for cluster schedulers.
Thanks everyone. (Sorry to nitpick here but it is perhaps a bit misrepresentative it is called “Distributed” despite only itself handling multiprocessing on a single node — not what I would call distributed computing… but just a minor point and I digress) Edit: I am completely wrong, see @aplavin’s helpful correction below!
@CameronBieganek I might not have been clear but I have been using ClusterManagers and Distributed quite extensively for maybe 3 years now. They are the current distributed computing backend for PySR and SymbolicRegression.jl. I have in the past been able to get decent communication going over ~4 nodes x 128 cores working with ClusterManagers.jl + Distributed.jl (your example has --nodes=1 which is why you can get away with only Distributed.jl — it doesn’t need to interact with slurm).
It’s just they’ve always felt a bit hacky and unreliable, with users of my libraries complaining they couldn’t get it working on PBS clusters (turns out ClusterManagers has been broken for two years there, with no fixes on the horizon).
And now, Slurm clusters have broken too! Of course there’s always the option to fix everything myself but not exactly what I signed up for as a user
With its current state you get people like me spending weeks and weeks implementing support for it – thinking I will be able support all of these types of clusters – only to find out it’s actually unmaintained and broken. The README should just be honest about this, and point people to MPI.jl instead.
My experience with clusters and ClusterManagers.jl is that it vastly depends on who is submitting and maintaining the recipes. Clusters vary by quite a bit. Perhaps the most consistency I have seen is within the DOE supercomputers, but even then there are variations between the clusters.
This mostly works for me, but that’s because @bjarthur put together the LSF support. I can physically locate him with a short walk or a Zoom call.
My best advice for this is to get your cluster admins involved. I have to do the same thing for Dask, Spark, and Ray deployments. For the most part cluster admins seem to appreciation being asked rather than users trying to improvise a solution. The main time when I’ve seen this become a headache is when cluster administration has been outsourced to a third party.
Thanks for the tip @mkitti, I can try. I should mention that most of my concern is about downstream users rather than myself. I can always hack together a working solution for my own cluster, but I am more interested in keeping my downstream users happy and able to use their clusters for SymbolicRegression.jl searches. For every barrier in the way (such as needing to email their sysadmin, etc) I will end up losing a significant fraction of potential users