Speed comparison: Sparse matrix multiplication vs usual matrix multiplication

carstenbauer · February 10, 2017, 12:58pm

Why is sparse * dense slower than dense * dense?

x = sprand(1000,1000,.1);
fx = full(x);
y = rand(1000,1000);

@time x*y;
  0.925036 seconds (31.94 k allocations: 9.071 MB)

@time x*y;
  0.885278 seconds (7 allocations: 7.630 MB)

@time fx*y;
  0.548416 seconds (398.70 k allocations: 21.551 MB, 11.83% gc time)

@time fx*y;
  0.086072 seconds (7 allocations: 7.630 MB, 59.55% gc time)

Reading #1831, I also tried a different sparse type

xa = Base.SparseArrays.CHOLMOD.Sparse(x);

@time xa*y;
  0.147444 seconds (51.25 k allocations: 17.378 MB)

@time xa*y;
  0.097434 seconds (5.47 k allocations: 15.518 MB, 5.12% gc time)

which is still not faster than dense * dense above (only for first call) but has vastly more allocations.

So, why is sparse * dense slower than dense * dense? Although I would also like to see an explanation, my primary interest is a convient way to exploit O(N^2) scaling of sparse multiplications over O(N^3) scaling for dense matrix multiplications.

To me this seems to be a highly important issue for many applications.

kristoffer.carlsson · February 10, 2017, 3:09pm

Because your sparse matrix isn’t very sparse at all.

carstenbauer · February 12, 2017, 8:04pm

Ok, I thought that 90% zeros is sparse. It seems like one needs ~99% zeros. Thank you for your answer.

Alexander-Barth · February 12, 2017, 9:12pm

here is a comparision with octave:

>> x  = sprand(1000,1000,0.1); fx = full(x); y = rand(1000,1000);
>> tic; x*y; toc()
Elapsed time is 0.448854 seconds.
>> tic; fx*y; toc()
Elapsed time is 0.077558 seconds.

while julia on my system (only the second invocation):

@time x*y;
  0.965171 seconds (7 allocations: 7.630 MB, 7.46% gc time)
@time fx*y;
  0.079769 seconds (7 allocations: 7.630 MB, 7.14% gc time)

So for full matrix the speed of julia is equivalent with octave, for sparse matrices octave is about 2x faster. For Matlab,
both products are about the same speed:

>> tic; x*y; toc()
Elapsed time is 0.092190 seconds.
>> tic; fx*y; toc()
Elapsed time is 0.081640 seconds.

Maybe, matlab has some heuristics which turn the matrix into a full matrix before multiplication. In my opinion, there is room for improvement for julia.

dpsanders · February 12, 2017, 11:54pm

Never benchmark in global scope ; see the performance tips in the manual.

kristoffer.carlsson · February 13, 2017, 8:04am

It is a good advice in general but it shouldn’t matter for this case since the time is large enough and only spent inside another function.

Alexander-Barth · February 13, 2017, 8:33am

Indeed, the result are quite similar with a local scope (and also with the @benchmark macro).

julia> function myfun()
       x = sprand(1000,1000,.1);
       fx = full(x);
       y = rand(1000,1000);
       @time x*y;
       @time fx*y;
       return nothing
       end

julia> myfun()
  0.897603 seconds (3 allocations: 7.630 MB)
  0.076642 seconds (3 allocations: 7.630 MB)

julia> myfun()
  0.894705 seconds (3 allocations: 7.630 MB)
  0.075856 seconds (3 allocations: 7.630 MB, 1.32% gc time)

Ralph_Smith · February 13, 2017, 2:37pm

Dense * dense products are handled by BLAS which (a) rearranges the computation to work on blocks so most memory accesses are in local cache, (b) works on sequential elements of memory so SIMD instructions are available, (c) has simple indexing structure so other run-time processor optimizations like prefetch work, and (d) is done by multi-threaded libraries on most Julia installations. Multi-threading would help many sparse * dense products, but is not yet standard in Julia itself. The other tools would only apply to sparse matrices with special structure.

Topic		Replies	Views
Sparse matrix vector faster than dense matrix vector? Performance	9	852	September 5, 2023
Performance discrepancy in sparse matrix product Performance	4	984	June 29, 2018
Sparse arrays allocation versus speed Performance sparsearrays	9	223	May 13, 2025
Sparse matrix-vector product: much more slow than Matlab Performance matlab , optimization	24	4545	December 20, 2017
Asymmetric speed of in-place `sparse*dense` matrix product General Usage	7	1531	November 8, 2018

Speed comparison: Sparse matrix multiplication vs usual matrix multiplication

Related topics