I have a large sparse matrix / 2-d-array, some random indices and want to obtain a sub matrix of the large sparse matrix for these random indices. I noticed that a view() is quite fast for doing this (order of ~ 500 ns), however I struggle to get a copy of the sub matrix at a similar performance (order of ~ 5 ms). Please see my benchmark below. The matrices are quadratic in my specific case. Is there a way to get the sub matrix more efficiently (as a copy)? I kind of need a copy/SparseMatrixCSC again, because I cannot use sparse arithmetics on the view (see Slow arithmetic on views of sparse matrices and https://github.com/JuliaLang/julia/issues/21796). Iām very happy for any help!
using SparseArrays
using Random
using BenchmarkTools
const n = 1000000
const sparr = sprand(n, n, 0.0001) # a large sparse matrix
const subinds = unique(sort(rand(1:n, 1000))) # indices for the sub matrix
const sparr_sub_view = view(sparr, subinds, subinds) # sub matrix view type
const sparr_sub_copy = sparr[subinds, subinds] # sub matrix copy/SparseMatrixCSC type
sparr_sub_copy == SparseMatrixCSC(sparr_sub_view) # this is 'true' as a check
# benchmarks:
@btime view(sparr, subinds, subinds) # 566.207 ns (3 allocations: 112 bytes)
@btime sparr[subinds, subinds] # 4.749 ms (7 allocations: 7.64 MiB)
Note that you are benchmarking instantiating a view object, which is a very cheap operation.
Possibly the relevant benchmark is using the result (of view or the submatrix), and you may find that making a copy is faster in the end, but this depends on what you are doing.
yes, working with the copy appears faster for me; also due to the fact that a view object does not have sparse methods such as findnz(), rowvals() and nonzeros(). (Maybe there is a way to get this for a view?? I just didnāt find it yet.)
But is this plain slicing to get the copy really the fastest way here (sparr_sub_copy = sparr[subinds, subinds])? I hoped that there is a more clever solution using sparse matrix iteration, but I was not successfull to improve this so far.
I stumbled upon the same problem and it is not super easy to get what you want because of the compressed format. I think the simplest is to work at the level of the vectors I,J,K given by findnz.
function getsubmat(sparr, subinds)
sparr_sub = spzeros(length(subinds), length(subinds))
lookup = Dict(subinds .=> 1:length(subinds))
rows = rowvals(sparr)
vals = nonzeros(sparr)
for j=1:size(sparr_sub)[2]
for i in nzrange(sparr, subinds[j])
row = rows[i]
val = vals[i]
if row in subinds
sparr_sub[lookup[row], j] = val
end
end
end
sparr_sub
end
const sparr_sub_copy2 = getsubmat(sparr, subinds)
sparr_sub_copy == sparr_sub_copy2 # is 'true' as a check
@btime getsubmat(sparr, subinds) # 52.227 ms (36 allocations: 120.47 KiB)
This makes use of the sparse matrix iteration scheme using rowvals() and nonzeros(). It allocates less memory compared to the slicing version, but is terribly slow. All quite unsatisfactory. It seems to me that there must be a better way to ājustā read out a submatrix in a below milli seconds range, but I cannot see it at the momentā¦
Something like this? I just sketched the code; the for loop is fast (~100 ns), however the ifās make it extremely slow.
const I, J, K = findnz(sparr)
function getsubmat(I, J, K, subinds)
Isub = Int[]
Jsub = Int[]
Ksub = Float64[]
for (i, j, k) in zip(I, J, K)
if j ā subinds && i ā subinds
# do stuff
end
end
# sparse(Isub, Jsub, Ksub)
end
@btime getsubmat(I, J, K, subinds) # 53.698 s (3 allocations: 240 bytes)
To extend my previous answer a bit, another version would be
function getsubmat(sparr, subinds)
Isub = Int[]
Jsub = Int[]
Ksub = Float64[]
lookup = Dict(subinds .=> 1:length(subinds))
rows = rowvals(sparr)
vals = nonzeros(sparr)
for j=1:length(subinds)
for i in nzrange(sparr, subinds[j])
row = rows[i]
val = vals[i]
if row in subinds
push!(Isub, lookup[row])
push!(Jsub, j)
push!(Ksub, val)
end
end
end
sparse(Isub, Jsub, Ksub)
end
@btime getsubmat(sparr, subinds) # 55.237 ms (54 allocations: 141.97 KiB)
which has again quite similar (terrible) performance.