Is there a lazy sparse matrix constructor?

Constructing a sparse matrix via the following is slow:

n = 200
m = 250

A = spzeros(n,m)

for i in 1:n
  for j in 1:m
    A[i,j] = rand()
  end
end

The way to speed this up is to use the following syntax:

cur_I = Int[]
cur_J = Int[]
cur_V = Number[]

for i in 1:n
  append!(cur_I, repeat([i], m))
  for j in 1:m
    push!(cur_J, j)
    push!(cur_V, rand())
  end
end

A = sparse(cur_I, cur_J, cur_V, n, m)

This begs the question, why isn’t there a constructor that can lazily put off construction of a sparse matrix until all the elements are ready?

The pseudocode for the result would be something like:

A = spzeros_lazy(n,m)

for i in 1:n
  for j in 1:m
    A[i,j] = rand()
  end
end

lazy_sparse_constructor!(A)

// the goal is not to be the most performant, just performant && easy to code

1 Like

Simple enough to code:

using SparseArrays

struct LazySparse{T, TV <: AbstractVector{T}, TI <: Integer, TVI <: AbstractVector{TI}}
	I::TVI
	J::TVI
	V::TV
	m::TI
	n::TI
end
LazySparse{T}(m::TI, n::TI) where {T, TI} = LazySparse(TI[], TI[], T[], m, n)

function Base.setindex!(a::LazySparse, v, i, j)
	push!(a.I, i)
	push!(a.J, j)
	push!(a.V, v)
	return v
end
SparseArrays.sparse(a::LazySparse) = sparse(a.I, a.J, a.V, a.m, a.n)

Example:

m = n = 2
A = LazySparse{Float64}(m, n)

for i in 1:m
  for j in 1:n
    A[i,j] = rand()
  end
end

sparse(A)
9 Likes

I guess now the question is can you make it smart?

By this, I mean is there a way to remove the explicit call to sparse()?

This would require:

  1. enforcing any non-setindex! call to first call sparse and cache it
  2. passing all calls to the cached matrix after the initial non-setindex! call

edit: another step could be to clear the cache every time a person calls setindex! (but that maybe shows why it’s not a good idea to make it smart)

You can just define the functions that you want to have materialize the sparse array which then call the Base/LinearAlgebra versions after a call to sparse. Metaprogramming can make life easier here. The second task can be achieved by memoizing the output of sparse(::LazySparse). There is Memoize.jl or you can use Cassette for that. Would be interesting to see if there are any undesirable side effects to this proposal.