Slow arithmetic on views of sparse matrices


I was profiling some code that was surprisingly slow, and it turned out that the source of the issue was a sum over a view into a large sparse matrix. Here is an MWE that recreates it:

d = sprand(Bool,10000,10000, 0.01)
e = view(d, rand(1:10000,5000), rand(1:10000,9000))

using BenchmarkTools
@benchmark sum($d, 1)
  memory estimate:  78.20 KiB
  allocs estimate:  2
  minimum time:     421.354 μs (0.00% GC)
  median time:      450.828 μs (0.00% GC)
  mean time:        463.575 μs (0.00% GC)
  maximum time:     1.226 ms (0.00% GC)
  samples:          10000
  evals/sample:     1

@benchmark sum($e, 1)
  memory estimate:  585.06 KiB
  allocs estimate:  51
  minimum time:     3.286 s (0.00% GC)
  median time:      3.312 s (0.00% GC)
  mean time:        3.312 s (0.00% GC)
  maximum time:     3.339 s (0.00% GC)
  samples:          2
  evals/sample:     1

That’s a slowdown by a factor of 7000.

Is this 1) expected behaviour and I’m best off not using views into sparse matrices?; 2) am I doing something wrong?; or 3) is it a bug?

EDIT: f = view(d, :, :) has the same issue.



I would think this is expected: The sum of a sparse CSC matrix is trivial: just sum the internal vector storing the values of non-zeros. Your sum of the view on the other hand, uses the generic sum function: sum(A::AbstractArray, region) at reducedim.jl:320. This will look up each index to sum it, which for a CSC sparse matrix is two pointer indirections, most of which will lead to “zero”. Thus 7000x slowdown seems about right. I suspect to make a specialized sum function specialized to random access sparse matrix will be hard and not much more efficient (although I might be wrong).


Depending on what your real use case is, if you’re trying to sum elements only within some contiguous subrange of rows and columns, it wouldn’t be hard to write a specialized routine for SparseMatrixCSC storage. Views of sparse matrices are not well optimized since they fall back to generic operations which aren’t aware of the sparsity.


So is the recommended approach to dispense with the view and just slice the matrix instead?


Slicing would make a copy, which you don’t necessarily need. Depends how sensitive you are to the performance here vs how much special-purpose code you want to write for a particular operation.


Yes, I realize it’ll make a copy, but in the end I think it’ll be a lot faster. Though, even though they are sparse 1e6 filled elements in the matrix is realistic for my use. I am sensitive to the performance though, the issue is that the elements are never contiguous in my actual use case. I think writing the specialized code is beyond me.


Do any of you know where the specialized sum code for SparseMatrixCSC is located? I can’t seem to find it in Base.


Probably calls into this?


@edit sum(sq) to look at the code, but essentially it boils down to sum(s.nzval). Anyway, have a look at the doc of nzrange for an example of an iteration over the non-zeros of a sparse matrix.


If you use a matrix product instead of a view to compute the sum, it’ll be fast.

d = sprand(Bool,10000,10000, 0.01)
i = rand(1:10000,5000)
j = rand(1:10000,9000)
e = view(d, i, j)

s = sum(e,1)
t = (sparse(ones(Int,length(i)), i, ones(Int,length(i)),1,10000)*d)[j]
@assert s[:] == t


That’s plain amazing :slight_smile:


I’ve opened a feature request for the general functionality:


The real problem is one of iteration—you want to visit just the relevant entries in an arbitrary AbstractArray. might be a solution some day, but it needs some compiler improvements to be blazingly fast. (See for more information.)

It still might be worth playing with—in particular, it might be worth “finishing” (I haven’t had the time lately) and registering as a package, even if it’s not as fast as it could be someday. If anyone wants to join the fun, I’d be happy to help coach folks through anything that’s not clear.