I recently found out about this project, which is finding some traction in the python community, mostly focusing on building/running distributed applications for machine learning. I am interested in the following questions:
- Is there a Julia library out there for this?
- Is there already an idiomatic way to do this in Julia?
- Can it be done in pure Julia right now?
Where I already looked:
- Found no Ray.jl or similar
- Looked at the Ray GitHub for Julia references, found none
- StackOverflow for Ray is empty
Parallel Computing Docs for a general understanding of the topic. I wasn’t able to find a similarly simple interface like a
I’m not affiliated to the project in any way. I’m just interested to see if a similar library exists or should exist in the Julia ecosystem.
At first glance, this looks pretty similar to the
remotecall interface already built into Julia (see https://docs.julialang.org/en/v1/manual/parallel-computing/index.html#Multi-Core-or-Distributed-Processing-1 ). Do you know what features ray has beyond that?
@ray.remote() looks like it’s defining the function on remote nodes. In julia
@everywhere will do that
The Ray core looks most similar to Dagger.jl
They both let you define compute graphs (DAGs) that will run in parallel on one or more compute nodes, with vaguely similar interfaces. Both have distributed dataframe implementations built on top of them (JuliaDB and modin.)
One of Ray’s original use-cases was for training ML models on GPUs, so it has good support for scheduling GPU resources (Dagger has this unmerged pull request), and the Ray devs are the original authors of the Plasma In-memory object store which is generally useful for certain GPU model training scenarios.
Not sure how Dagger fares in terms of fault tolerance these days, I see that some related PRs are in the works. Ray uses redis instances to store state, a distributed task scheduling model that might be resilient to some node failure and allows tasks to spawn sub-tasks, and Plasma.
Maybe @jpsamaroo can elaborate on plans for Dagger?
Ray seems to be an interesting project. In some ways, it is similar to Dagger, although its approach to scheduling and assigning work is noticeably different:
- Ray schedules work from node to worker, Dagger schedules from master to worker (no node abstraction in between for now).
- Ray’s object store can use shared memory between workers on the same node, Dagger’s object store (MemPool) probably does not, but I haven’t dug into those details too much so far.
- Ray relies on many external (non-Python) dependencies, like Redis and Bazel, while Dagger is pure-Julia.
- Ray exposes direct access to its datastores via put/get, Dagger does not (although it could).
- Ray has explicit GPU support, Dagger does not, but see below for future directions.
- Dagger has less specific integrations (like ML or GPU support).
- Ray is very well documented (from a light perusal of the docs), Dagger’s docs… suck.
- Ray (probably) has many more active contributors than Dagger, and has a larger ecosystem to draw contributors from.
Otherwise, the two projects are very similar in usage and semantics (eerily so, in my opinion). One is not necessarily better than the other, and both have plenty of potential. Since I don’t know much more about Ray, here’s what I see as the future for Dagger:
- More direct access by users to MemPool’s datastore
- Knowledge about node layout and features made available to the scheduler
- Specific details about latencies (network, thunk/job, scheduling, etc.) “learned” by the scheduler to improve efficiency and occupancy
- Improved scheduling efficiency overall (there are some low-hanging fruit still)
And GPU support gets its own section:
GPU support for distributed computing is non-trivial because of how different they behave compared to CPUs. They have an extra hop for data transfer, do not currently have great support for global memory access in most operating systems (compared to CPUs), handle certain workloads much more or less efficiently than CPUs, have a variety of “weird”, niche instructions and storage formats, etc. Additionally, the layout of GPUs in a compute cluster is not as obvious or as easy to detect as CPUs, since they are attached devices (not usually integrated into the motherboard or CPU chipset, although sometimes they are!).
That said, GPU support in Dagger is something I’ve been thinking on here and there. The linked PR is a start at solving this, but I don’t believe it actually touched the scheduler at all, so its efficiency will be hard to predict and manage. In order for GPU support in Dagger to be efficient, Dagger needs to be able to figure out where GPUs exist (on which nodes), what their capabilities are, and how well the user’s tasks run on a GPU vs. a CPU. Communicating this information “by hand” is probably infeasible for anything other than basic matrix multiplications, and so I believe that allowing Dagger to benchmark equivalent CPU and GPU kernels will be key (and is something that both CuArrays/ROCArrays and GPUifyLoops can make possible). However, I haven’t gotten to the point of figuring out what that interface will look like, so this is all just hand-waving; I don’t have the slightest idea yet of how to implement this either.
How does Dagger compare to Dask?