Resources to Understand Distributed.jl

Besides the Distributed.jl documentation, are there any other good resources to understand the theory/application of the library? I’d figure to ask here before I do a deep dive on the topic.

1 Like

I think @jpsamaroo recommends Dagger.jl instead of Distributed.jl.

Oh interesting. What’s the reason for that?

Distributed.jl doesn’t provide a whole lot of (composable) tools for doing distributed parallelism, in my opinion. pmap is pretty limited to doing just map operations, and @distributed performs reductions, but that’s really it. If you wanted to do parallel computing on any “real” data structure, like an array, table, or graph, you need to reach for something more capable (like Dagger.jl, which provides distributed implementations for all 3 respective data structures).

Additionally, Distributed.jl fails to provide any useful abstractions for orchestrating data movement or remote task execution, which is important to be able to do any kind of more complicated distributed parallelism. It would be like trying to do multithreaded programming without tasks or built-in atomics - you could do it, but you’d have to do a lot of work yourself to fill in what’s missing. Again, Dagger provides abstractions for data management and provides tasks which can seamlessly execute across a cluster of Julia processes.

Still, it’s not like Distributed.jl is inherently bad - Dagger.jl uses it for all of its distributed operations, to move data and execute functions remotely. But it’s just not a great user-facing interface - it really should be hidden behind a more user-friendly library, like Dagger.jl, where it can be a platform to build upon.


Awesome to know!

I was having problems implementing Distributed.jl yesterday, so I’ll look into Dagger.jl

@jpsamaroo, I would greatly appreciate if you could take a look at my first question on Dagger.jl: Not seeing speed-up with Dagger.jl. Am I doing something wrong?