Graph computing benchmarks: comparing the scalability of Dask, Dagger.jl, Tensorflow and Julius

Yadong_Li · May 9, 2022, 10:02am

Graph computing solutions like Dask/Dagger.jl are gaining popularity among developers, mainly because of their easy built-in distribution capabilities. If you are considering one of these graph computing solutions for your next project, but are uncertain about how well they can scale to handle real world use cases, this benchmark has the answers for you.

Overall, Julius scales 100-1000 times better than the best alternatives depending on the problems, making it the only suitable graph solution for enterprise use cases. I know the numbers sound too good to be true, but you can verify it yourself by signing up for developer access to Julius here.

Comments and suggestions are very welcome!

jpsamaroo · May 19, 2022, 3:45pm

Hi @Yadong_Li! I’ve taken an interest in this benchmark over the last week, and will be publishing a blog post providing some commentary on these results from my perspective as Dagger’s maintainer, including how I improved Dagger’s runtime on the benchmark with some performance optimizations. I’ll provide a link once it’s published!

One thing I found in the process of working with this benchmark is that Dagger and Dask are actually receiving an unfair advantage: in both y_n and s_n implementations, the final spawned task is not waited on/fetched, resulting in the benchmark function returning before all computations have completed. This might not be an issue in Jupyter (maybe they’re automatically fetched to print the result of the final value?), but it is problematic when running from the REPL or a regular script. I have some updated benchmark scripts for Dagger and Dask at Julius Graph Benchmarks: Dagger and Dask · GitHub. I would love if you could use the relevant pieces from those updated scripts to update the benchmark results!

Anyway, great work on these benchmarks, and I hope we can work together to improve the graph computing experience in Julia going forward!

Yadong_Li · May 19, 2022, 4:22pm

@jpsamaroo thanks for pointing out the missing fetch, indeed I was running them in Jupyter and it seems the results were automatically fetched by Jupyter. I will update the benchmark results according to your script.

Yes, we are more than happy to work with you to provide the best graph computing tools for Julia developers!

Yadong_Li · May 21, 2022, 12:45pm

hi, @jpsamaroo , I have updated the benchmark using the latest master version of Dagger.jl and included the relevant fetch/compute calls in Dagger.jl/Dask. The latest Dagger.jl is significantly faster than the older version, great work in speeding it up! Please take a look at the latest results and let us know if you have any additional comments or suggestions. I noticed that Dagger.jl throws errors for large N (> 100,000), I have reported the error in your github gist link above.

jpsamaroo · May 22, 2022, 5:57pm

Thanks a bunch for also re-running with Dagger master! Your results match up with what I got.

The error looks like a race somewhere in MemPool or Dagger, but I didn’t encounter any of those on my system while running benchmarks up to 500K. Might have something to do with the Julia or MemPool versions being used; I ran with Julia 1.7.2 and the latest MemPool.

Yadong_Li · May 22, 2022, 6:08pm

I’m using Julia 1.6.6, the LTS version, and MemPool v0.3.9. For me, the error happens sporadically, some time it happens for smaller N as well, but for N>200K, it occurs almost every time,

James_Lee · July 22, 2022, 3:29pm

For those of you who want to learn more about how Julius graph computing works, we gave a pre-conference workshop which can be accessed here: Graph Computing with JuliusTech | JuliaCon 2022 | Yadong Li - YouTube

Topic		Replies	Views
Julia vs Python's Dask: Known speed comparisons? Julia at Scale question , parallel , distributed	13	5105	October 15, 2019
Introducing Julius: low code and scalable graph computing in Julia Julia at Scale distributed , graphs	2	1444	April 27, 2022
Introducing graph computing and programming in Julia Julia at Scale	0	475	March 9, 2022
Dagger, No speed increase, parallel computing General Usage	5	1776	January 26, 2018
Mentor for HPC GSoC project Community	2	824	February 28, 2017

Graph computing benchmarks: comparing the scalability of Dask, Dagger.jl, Tensorflow and Julius

Related topics