How to get a sub matrix of a large sparse matrix / array efficiently?

mlanghinrichs · March 30, 2020, 1:47pm

Hi all,

I have a large sparse matrix / 2-d-array, some random indices and want to obtain a sub matrix of the large sparse matrix for these random indices. I noticed that a view() is quite fast for doing this (order of ~ 500 ns), however I struggle to get a copy of the sub matrix at a similar performance (order of ~ 5 ms). Please see my benchmark below. The matrices are quadratic in my specific case. Is there a way to get the sub matrix more efficiently (as a copy)? I kind of need a copy/SparseMatrixCSC again, because I cannot use sparse arithmetics on the view (see Slow arithmetic on views of sparse matrices and https://github.com/JuliaLang/julia/issues/21796). I’m very happy for any help!

using SparseArrays
using Random
using BenchmarkTools

const n = 1000000
const sparr = sprand(n, n, 0.0001) # a large sparse matrix
const subinds = unique(sort(rand(1:n, 1000))) # indices for the sub matrix

const sparr_sub_view = view(sparr, subinds, subinds) # sub matrix view type
const sparr_sub_copy = sparr[subinds, subinds] # sub matrix copy/SparseMatrixCSC type
sparr_sub_copy == SparseMatrixCSC(sparr_sub_view) # this is 'true' as a check

# benchmarks:
@btime view(sparr, subinds, subinds) # 566.207 ns (3 allocations: 112 bytes)
@btime sparr[subinds, subinds] # 4.749 ms (7 allocations: 7.64 MiB)

Tamas_Papp · March 30, 2020, 1:55pm

Note that you are benchmarking instantiating a view object, which is a very cheap operation.

Possibly the relevant benchmark is using the result (of view or the submatrix), and you may find that making a copy is faster in the end, but this depends on what you are doing.

mlanghinrichs · March 30, 2020, 2:01pm

yes, working with the copy appears faster for me; also due to the fact that a view object does not have sparse methods such as findnz(), rowvals() and nonzeros(). (Maybe there is a way to get this for a view?? I just didn’t find it yet.)

But is this plain slicing to get the copy really the fastest way here (sparr_sub_copy = sparr[subinds, subinds])? I hoped that there is a more clever solution using sparse matrix iteration, but I was not successfull to improve this so far.

Tamas_Papp · March 30, 2020, 2:18pm

These could be implemented, but I don’t think it has been done (yet).

rveltz · March 30, 2020, 4:48pm

I stumbled upon the same problem and it is not super easy to get what you want because of the compressed format. I think the simplest is to work at the level of the vectors I,J,K given by findnz.

mlanghinrichs · March 30, 2020, 5:54pm

yeah, I don’t know, what I tried was:

function getsubmat(sparr, subinds)
        sparr_sub = spzeros(length(subinds), length(subinds))
        lookup = Dict(subinds .=> 1:length(subinds))

        rows = rowvals(sparr)
        vals = nonzeros(sparr)

        for j=1:size(sparr_sub)[2]
                for i in nzrange(sparr, subinds[j])
                        row = rows[i]
                        val = vals[i]
                        if row in subinds
                                sparr_sub[lookup[row], j] = val
                        end
                end
        end
        sparr_sub
end

const sparr_sub_copy2 = getsubmat(sparr, subinds)
sparr_sub_copy == sparr_sub_copy2 # is 'true' as a check

@btime getsubmat(sparr, subinds) # 52.227 ms (36 allocations: 120.47 KiB)

This makes use of the sparse matrix iteration scheme using rowvals() and nonzeros(). It allocates less memory compared to the slicing version, but is terribly slow. All quite unsatisfactory. It seems to me that there must be a better way to “just” read out a submatrix in a below milli seconds range, but I cannot see it at the moment…

mlanghinrichs · March 30, 2020, 5:59pm

Could you elaborate on this a bit more?

rveltz · March 30, 2020, 6:08pm

You loop over I,J to find if the indices are in your subindices. You get new Isub, Jsub and Ksub and give this to sparse

mlanghinrichs · March 30, 2020, 7:03pm

Something like this? I just sketched the code; the for loop is fast (~100 ns), however the if’s make it extremely slow.

const I, J, K = findnz(sparr)

function getsubmat(I, J, K, subinds)
        Isub = Int[]
        Jsub = Int[]
        Ksub = Float64[]

        for (i, j, k) in zip(I, J, K)
                if j ∈ subinds && i ∈ subinds
                        # do stuff
                end
        end
        # sparse(Isub, Jsub, Ksub)
end

@btime getsubmat(I, J, K, subinds) # 53.698 s (3 allocations: 240 bytes)

rveltz · March 30, 2020, 7:26pm

you should propably loop over subinds instead and take advantage of I being ordered

mlanghinrichs · March 30, 2020, 8:22pm

I think I don’t see what you mean… This is then basically what I tried above already, isn’t it?

This is a loop over subinds to get the columns of sparr

mlanghinrichs · March 30, 2020, 8:51pm

To extend my previous answer a bit, another version would be

function getsubmat(sparr, subinds)
        Isub = Int[]
        Jsub = Int[]
        Ksub = Float64[]

        lookup = Dict(subinds .=> 1:length(subinds))

        rows = rowvals(sparr)
        vals = nonzeros(sparr)

        for j=1:length(subinds)
                for i in nzrange(sparr, subinds[j])
                        row = rows[i]
                        val = vals[i]

                        if row in subinds
                                push!(Isub, lookup[row])
                                push!(Jsub, j)
                                push!(Ksub, val)
                        end
                end
        end
        sparse(Isub, Jsub, Ksub)
end

@btime getsubmat(sparr, subinds) # 55.237 ms (54 allocations: 141.97 KiB)

which has again quite similar (terrible) performance.

Topic		Replies	Views
Is working with SubArrays of a SparseMatrix a bad idea? General Usage question , bug , sparse , views	9	372	November 9, 2022
Sparse matrix and sparse subarray subtraction returns full arrays Numerics sparse	6	407	October 11, 2022
Why operation on return of @views take a long time Performance	7	188	September 6, 2024
Slow arithmetic on views of sparse matrices General Usage	12	2293	May 11, 2017
Sparse matrices copy arrays, do not view them Performance question , sparse , matrices	18	509	January 31, 2024

How to get a sub matrix of a large sparse matrix / array efficiently?

Related topics