Hey folks! I’m very excited to announce version 0.7.0 of SuiteSparseGraphBLAS.jl! This is the first release in what I hope will be a period of bugfixes, convenience function, and ecosystem development before v1.0!
SuiteSparseGraphBLAS.jl is a sparse linear algebra library, with some extra tricks up its sleeves specialized for graph algorithms. It’s fast and multithreaded, and the underlying C library is developed by the illustrious Tim Davis (tamu.edu) (responsible for our sparse solver ecosystem in SuiteSparse.jl).
If you’re interested in a very brief introduction to SuiteSparseGraphBLAS.jl checkout my blog post!
For a hint of the performance checkout this comparison to SparseArrays.jl on sparse * dense matrix multiplication (n x n * n x 32) from the blogpost! In this specific instance SuiteSparseGraphBLAS.jl can hit up to 3x speedups on a single thread, and well over 30x speedups on 16 threads.
Please give it a try if you have any sparse linear algebra or graph algorithm needs! And please feel free to reach out to me for more information!
24 Likes
This is great.
A big step to make Julia’s Sparse eco system the leading one in the camp.
Could you add performance comparison to MKL
?
Eco system wise, I think the next step is focusing on decompositions and specifically on incomplete decompositions.
5 Likes
Nice! Could you please share the code for the benchmarks? I am particularly interested in the sparse matrix * dense vector case and would like to see how SuiteSparseGraphBLAS.jl compares with MKL and with ThreadedSparseCSR.jl.
3 Likes
To both you and @RoyiAvital the benchmarks code is in the repo under the benchmarks top level. If you look in the blog post you can find a link to Tim Davis’ upcoming paper in ACM TOMS (or find it here: GraphBLAS/toms_parallel_grb2.pdf at stable · DrTimothyAldenDavis/GraphBLAS (github.com)). SuiteSparse:GraphBLAS, the C library, performs quite well against MKL in that paper, beating it in almost every case.
I will add a wider performance comparison soon, especially vs MKL and ThreadedSparseCSR.jl. I suspect ThreadedSparseCSR.jl will perform just as well as SuiteSparseGraphBLAS.jl for matrices suited to whatever kernel ThreadedSparseCSR.jl uses (dot-product? I’m not sure), but I’d have to look at the actual code and run the benchmarks. Really I need to integrate better with SparseMatricesCSR.jl to achieve this.
3 Likes
Very impressive run time results in Tim Davis’ paper.
I wish GraphBLAS had a version with 32 Bit indices as well.
It might give an extra boost for most matrices.
@rayegun , What’s your estimation to the overhead of your wrapper compared to the results on the paper?
If you check out this paper is has some results against LAGraph. I will actively look for the sources of overhead more in the future (it’s not totally clear to me why there’s as much as 10-15% overhead in some cases) although I expect we’re the fastest wrapper. (Stay tuned, there might be some pretty cool compiler trickery on the way that could theoretically improve this, although it’s intended for other purposes).
2 Likes
OK. I hope you’ll be able to bring the overhead down to the minimum.
Who knows, maybe one day it will all be implemented in Julia and then exported as a dynamic / static library to be used in other languages.
2 Likes
I am the author of ThreadedSparseCSR.jl. I am not sure what you mean by kernel used. It is just a very simple implementation of sparse matrix*vec product using loops, which was adapted from ThreadedSparseArrays.jl to use together with CSR matreices from SparseMatricesCSR.jl. I would be very surprised if ThreadedSparseCSR.jl was competitive with SuiteSparseGraphBLAS.jl
1 Like