The complexity of extracting a sparse submatrix A[i,j] of size k x l from a large sparse matrix of size n with nnz entries should be O(k log k + l nz log k).

Accessing my sparse matrices seemed very slow and I checked the algorithms… They seem fairly optimized but the complexity does not match the one I stated above. I ran some benchmarks, keeping the number nonzero entries per column constant, as well as the k x l submatrix, while scaling the overall problem size. It seems that the problem scales linearly with the problem size, which should be avoided at all cost!

Is this a misunderstanding on my side/did I miss something? Could someone explain to me the dependency on the total size of the matrix? In my opinion this shouldn’t be happening with CSC matrices.

I have uploaded my tests online, you can run them as Pluto notebook or just as a script:

k and l are the size of I and J and nz is the number of nonzero entries per row. The plot shows the timing if the overall size of A is increased while keeping k, l, nz constant. So no, this is not what I would expect

yes, I am trying to verify the complexity of getindex being quasilinear w.r.t. k, l, nz. In my understanding there should be no scaling with the problem size.

I can pre-sort I if necessary but that doesn’t seem to be the issue here. (I have also updated the plot)

Some of these indexing implementations use a cache the length of the entire column (and not just the subset), which does indeed seem suboptimal. That’s likely where this is coming from. Finding a way to limit that to the size of the index would be great.

Oh wow - thank you for finding this. I was questioning my sanity.

I had completely overlooked this. This seems quite problematic to me. The whole point of sparse matrices is to avoid any scaling related to the overall problem-size. The stated complexity is for getindex_I_sorted_bsearch_I… Using a cache like this seems to undermine the entire implementation of the algorithm…

Just to get started, a really helpful interactive development/debugging technique is to copy the existing implementation into your favorite IDE/REPL, edit it, and then update the definition with an @eval SparseArrays to re-evaluate it in the context of the SparseArrays module. You may also want to remove @inbounds while developing to make errors more obvious.

Once you have an implementation that makes you happy, you can just edit it in the browser on GitHub directly on the page I linked above.

Hi, I had some time to have a go at it and I wrote new routines for getindex_I_sorted_linear and getindex_I_sorted_bsearch_I. As pointed out above, they had suboptimal complexities with dependency on the overall problem size, which can be quite catastrophic, when one does scaling studies for instance. As a consequence, it seems to me that my new routines are not quite as optimised (I have little experience in optimising Julia code using @simd and @inbounds), so I would be grateful if someone could have look at it.

That being said, the complexities are now correct, as can be verified in the repository that I have linked above. I have also attached some plots that prove this.

I should note that getindex_I_sorted_bsearch_I is the function which I would like tips on how to optimise it as I feel the algorithm is the correct one and satisfactory to be put into a pull request.

For getindex_I_sorted_linear, I am currently using a Dict{Int,Int}, which effectively implements a hashmap. This seems to be quite slow though, even though at some point it would win. For index sets that are relatively compressed one could stick with the current implementation and just reduce the cache to the envelope of the index set. I am not entirely sure whether I should write a custom hashmap algorithm for this one…

I think you should probably make the pull request now. There is probably a bunch of improvements, but making a PR is a pretty decent way to get eyes on it.