Constructing a sparse matrix via the following is slow:
n = 200
m = 250
A = spzeros(n,m)
for i in 1:n
for j in 1:m
A[i,j] = rand()
end
end
The way to speed this up is to use the following syntax:
cur_I = Int[]
cur_J = Int[]
cur_V = Number[]
for i in 1:n
append!(cur_I, repeat([i], m))
for j in 1:m
push!(cur_J, j)
push!(cur_V, rand())
end
end
A = sparse(cur_I, cur_J, cur_V, n, m)
This begs the question, why isn’t there a constructor that can lazily put off construction of a sparse matrix until all the elements are ready?
The pseudocode for the result would be something like:
A = spzeros_lazy(n,m)
for i in 1:n
for j in 1:m
A[i,j] = rand()
end
end
lazy_sparse_constructor!(A)
// the goal is not to be the most performant, just performant && easy to code
1 Like
Simple enough to code:
using SparseArrays
struct LazySparse{T, TV <: AbstractVector{T}, TI <: Integer, TVI <: AbstractVector{TI}}
I::TVI
J::TVI
V::TV
m::TI
n::TI
end
LazySparse{T}(m::TI, n::TI) where {T, TI} = LazySparse(TI[], TI[], T[], m, n)
function Base.setindex!(a::LazySparse, v, i, j)
push!(a.I, i)
push!(a.J, j)
push!(a.V, v)
return v
end
SparseArrays.sparse(a::LazySparse) = sparse(a.I, a.J, a.V, a.m, a.n)
Example:
m = n = 2
A = LazySparse{Float64}(m, n)
for i in 1:m
for j in 1:n
A[i,j] = rand()
end
end
sparse(A)
9 Likes
I guess now the question is can you make it smart?
By this, I mean is there a way to remove the explicit call to sparse()
?
This would require:
- enforcing any non-
setindex!
call to first call sparse
and cache it
- passing all calls to the cached matrix after the initial non-
setindex!
call
edit: another step could be to clear the cache every time a person calls setindex!
(but that maybe shows why it’s not a good idea to make it smart)
You can just define the functions that you want to have materialize the sparse array which then call the Base
/LinearAlgebra
versions after a call to sparse
. Metaprogramming can make life easier here. The second task can be achieved by memoizing the output of sparse(::LazySparse)
. There is Memoize.jl or you can use Cassette for that. Would be interesting to see if there are any undesirable side effects to this proposal.