Getindex A[i,j] wrong complexity on SparseMatrixCSC?

bonevbs · April 14, 2021, 1:52pm

The complexity of extracting a sparse submatrix A[i,j] of size k x l from a large sparse matrix of size n with nnz entries should be O(k log k + l nz log k).

Accessing my sparse matrices seemed very slow and I checked the algorithms… They seem fairly optimized but the complexity does not match the one I stated above. I ran some benchmarks, keeping the number nonzero entries per column constant, as well as the k x l submatrix, while scaling the overall problem size. It seems that the problem scales linearly with the problem size, which should be avoided at all cost!

Is this a misunderstanding on my side/did I miss something? Could someone explain to me the dependency on the total size of the matrix? In my opinion this shouldn’t be happening with CSC matrices.

I have uploaded my tests online, you can run them as Pluto notebook or just as a script:

github.com

bonevbs/nla-notebooks/blob/main/sparse_getindex.jl

### A Pluto.jl notebook ###
# v0.12.21

using Markdown
using InteractiveUtils

# ╔═╡ 9976914c-9bc5-11eb-3b13-abe73dd8f7c4
using SparseArrays, BenchmarkTools, DataFrames, Random, Plots

# ╔═╡ bd3807cc-9eaf-11eb-030f-1da9d12e7488
@eval SparseArrays include("getindex_version3.jl");

# ╔═╡ a8f932f0-9c6a-11eb-3d59-fbaba6e0f95f
md"
# Extraction of submatrices from sparse matrices in CSC format
"

# ╔═╡ 164e5d30-9fc2-11eb-0526-b339c70cb415
theme(:bright)

This file has been truncated. show original

Oscar_Smith · April 14, 2021, 2:16pm

Isn’t this the expected complexity? You said you expected O(K*log(k) + L*nz*log(K)) which is more than linear in K and L.

bonevbs · April 14, 2021, 2:19pm

k and l are the size of I and J and nz is the number of nonzero entries per row. The plot shows the timing if the overall size of A is increased while keeping k, l, nz constant. So no, this is not what I would expect

Oscar_Smith · April 14, 2021, 2:23pm

Can you upload a version of the plot with labeled axes?

bonevbs · April 14, 2021, 2:24pm

good point, let me correct that

mbauman · April 14, 2021, 2:29pm

Here’s the relevant section of the notebook, since it wasn’t clear to me what you were measuring by your description alone:

github.com

bonevbs/nla-notebooks/blob/9aceb046281e978f5f6fb8df230a04d22e78d3be/sparse_getindex.jl#L36-L51


      
          for r = 1:length(df[:,1])
          	n = df.n[r]
          	ρ = nrow/n
          	println("Running it for n = ", n)
          	A = sprandn(n,n,ρ)
          	df[r, "average nnz per row"] = nnz(A)/n
          	i = randperm(n)[1:k]; j = randperm(n)[1:k]
          	b = @benchmark $A[$i,$j]
          	println(b)
          	df[r, "extraction time"] = mean(b.times)
          	df[r, "memory used"] = b.memory
          	i = collect(100:200)
          	b = @benchmark  SparseArrays.getindex_I_sorted_bsearch_I($A, $i, $j)
          	df[r, "extraction time (sorted)"] = mean(b.times)
          	df[r, "memory used (sorted)"] = b.memory
          end

In short, you’re looking specifically at getindex(A::SparseMatrixCSC, I::Vector{Int}, J::Vector{Int}), with I sometimes sorted (and contiguous).

bonevbs · April 14, 2021, 2:31pm

yes, I am trying to verify the complexity of getindex being quasilinear w.r.t. k, l, nz. In my understanding there should be no scaling with the problem size.

I can pre-sort I if necessary but that doesn’t seem to be the issue here. (I have also updated the plot)

mbauman · April 14, 2021, 2:41pm

Some of these indexing implementations use a cache the length of the entire column (and not just the subset), which does indeed seem suboptimal. That’s likely where this is coming from. Finding a way to limit that to the size of the index would be great.

github.com

JuliaLang/julia/blob/2fed7c34739cfe97610fa648b8ddc29b3c5141d6/stdlib/SparseArrays/src/sparsematrix.jl#L2318


      
          end
          
          
function getindex_I_sorted_linear(A::AbstractSparseMatrixCSC{Tv,Ti}, I::AbstractVector, J::AbstractVector) where {Tv,Ti}
              require_one_based_indexing(A, I, J)
              nI = length(I)
              nJ = length(J)
          
          
    colptrA = getcolptr(A); rowvalA = rowvals(A); nzvalA = nonzeros(A)
              colptrS = Vector{Ti}(undef, nJ+1)
              colptrS[1] = 1
              cacheI = zeros(Int, size(A, 1))
          
          
    ptrS   = 1
              # build the cache and determine result size
              @inbounds for j = 1:nJ
                  col = J[j]
                  ptrI::Int = 1 # runs through I
                  ptrA::Int = colptrA[col]
                  stopA::Int = colptrA[col+1]
                  while ptrI <= nI && ptrA < stopA
                      rowA = rowvalA[ptrA]

github.com

JuliaLang/julia/blob/2fed7c34739cfe97610fa648b8ddc29b3c5141d6/stdlib/SparseArrays/src/sparsematrix.jl#L2379-L2383


      
          m = size(A, 1)
          
          
# cacheI is used first to store num occurrences of each row in columns of interest
          # and later to store position of first occurrence of each row in I
          cacheI = zeros(Int, m)

bonevbs · April 14, 2021, 2:51pm

Oh wow - thank you for finding this. I was questioning my sanity.

I had completely overlooked this. This seems quite problematic to me. The whole point of sparse matrices is to avoid any scaling related to the overall problem-size. The stated complexity is for getindex_I_sorted_bsearch_I… Using a cache like this seems to undermine the entire implementation of the algorithm…

mbauman · April 14, 2021, 2:55pm

Yup, definitely should be improved. Want to take a crack at a pull request? I’m happy to help along the way if it’s new to you.

bonevbs · April 14, 2021, 3:00pm

Definitely, glad to accept your help as well as I have never done this before. I will have a crack at it!

mbauman · April 14, 2021, 3:45pm

Just to get started, a really helpful interactive development/debugging technique is to copy the existing implementation into your favorite IDE/REPL, edit it, and then update the definition with an @eval SparseArrays to re-evaluate it in the context of the SparseArrays module. You may also want to remove @inbounds while developing to make errors more obvious.

Once you have an implementation that makes you happy, you can just edit it in the browser on GitHub directly on the page I linked above.

bonevbs · April 14, 2021, 3:46pm

Thank you so much - this is very helpful! I had copied the function into the above notebook but I think I will follow your advice

bonevbs · April 17, 2021, 9:25pm

Hi, I had some time to have a go at it and I wrote new routines for getindex_I_sorted_linear and getindex_I_sorted_bsearch_I. As pointed out above, they had suboptimal complexities with dependency on the overall problem size, which can be quite catastrophic, when one does scaling studies for instance. As a consequence, it seems to me that my new routines are not quite as optimised (I have little experience in optimising Julia code using @simd and @inbounds), so I would be grateful if someone could have look at it.

That being said, the complexities are now correct, as can be verified in the repository that I have linked above. I have also attached some plots that prove this.

The code can be found here: https://github.com/bonevbs/nla-notebooks/blob/main/mygetindex.jl

bonevbs · April 17, 2021, 9:34pm

I should note that getindex_I_sorted_bsearch_I is the function which I would like tips on how to optimise it as I feel the algorithm is the correct one and satisfactory to be put into a pull request.

For getindex_I_sorted_linear, I am currently using a Dict{Int,Int}, which effectively implements a hashmap. This seems to be quite slow though, even though at some point it would win. For index sets that are relatively compressed one could stick with the current implementation and just reduce the cache to the envelope of the index set. I am not entirely sure whether I should write a custom hashmap algorithm for this one…

Oscar_Smith · April 18, 2021, 5:22am

I think you should probably make the pull request now. There is probably a bunch of improvements, but making a PR is a pretty decent way to get eyes on it.

bonevbs · April 18, 2021, 7:51am

ok, thanks for the input I made the pull-request here Change getindex algorithms on sparse matrices by bonevbs · Pull Request #40519 · JuliaLang/julia · GitHub

viralbshah · April 19, 2021, 8:48pm

Can you also try very large n (> 10^6) and < 1 nonzero/row?

-viral

bonevbs · April 19, 2021, 11:30pm

this should be covered by `getindex_I_sorted_bsearch_A, right? I will have some time on the weekend to look at it

Topic		Replies	Views
Help speeding up submatrix extraction from SparseCSC General Usage performance	1	260	August 15, 2023
A lot of big sparse matrices Performance	8	456	December 12, 2020
How to get a sub matrix of a large sparse matrix / array efficiently? Performance question , sparse	11	2252	March 30, 2020
Extracting dense submatrix from sparse matrix GPU	2	667	May 17, 2021
Some thoughts on improving the doc of SparseArrays.jl General Usage sparsearrays	1	82	May 13, 2025

Getindex A[i,j] wrong complexity on SparseMatrixCSC?

Related topics