Sparse dot product: return 1x1 SparseArray, not Float64

I have an array J with a known fixed sparsity structure–it’s the Jacobian of some function f(x) from R^n to R^n, but the sparsity structure doesn’t depend on x. I’m repeatedly calculating M = c_1 I+c_2 J(x_1)' J(x_2), for different pairs (x_1, x_2). Focus on that second term: due to the transpose, the matrix product is dot products of pairs of columns, which works great with the CSC format! I have the sparsity structure of J calculated, with 0.0 for some entries that may later become nonzero. Now I want to calculate the sparsity structure of M. Here’s what I have right now:

using SparseArrays
function create_M_matrix_structure!(
    rows::Vector{Int32},
    columns::Vector{Int32},
    values::Vector{Float64},
    Jv::SparseMatrixCSC{Float64, Int32})

    for (i, j) in Iterators.product(axes(Jv, 1), axes(Jv, 1))
        v = @view Jv[:, i]
        w = @view Jv[:, j]
        val = SparseArrays.dot(v, w)
        # how do I tell if val is structurally nonzero?
        # I could check if v .* w has structurally nonzero entries,
        # but that will often allocate several entries when I only need 1...
        if val_is_structurally_nonzero
            push!(rows, i)
            push!(columns, j)
            push!(values, val)
        end
    end
end

What’s the right replacement for val_is_structurally_nonzero? Basically, I want a 1x1 SparseArray, not a Float64, so that I can check if there’s 1 or 0 entries allocated.

I could simply set all structurally nonzero entries of J to 1.0 via Jv .= 1.0, then val != 0.0 would do the trick. But J contains some data that I’d like to preserve.

I wouldn’t compute the dot product, I would look at the indices in each column, e.g.

rows = rowvals(Jv)
@views irows, jrows = rows[nzrange(Jv, i)], rows[nzrange(Jv, j)]
val_is_structurally_nonzero = !isdisjoint(irows, jrows)

PS. Your argument-type declarations seem overly narrow. They don’t help performance, they just make your function less generic.

Aha. I haven’t come across nzrange before–that makes writing algorithms that work with each column much cleaner.

Will the !isdisjoint be done in an efficient manner? irows and jrows are sorted, so there’s a linear time in-place intersection algorithm. The type of irows is fairly generic, view(::Vector{Int64}, 14:26), so I suspect it’ll default to something slower…

isdisjoint doesn’t take advantage of them being sorted. It uses the quadratic-time algorithm for small arrays, and allocates a Set hash table for larger arrays.

But you can easily write an isdisjoint_sorted function that implements the fast algorithm for two sorted arrays.

You’re just computing J'*J in COO (sometimes called IJV) format. I expect the library authors have thought very hard about the most efficient way to do the multiplication (and conversion to COO is pretty fast) so why not let them do it for you? It looks like the library already computes the structural sparsity rather than the numeric sparsity, so it seems to fit your needs perfectly.

julia> J = sparse(Int32[1,1,4], Int32[2,3,4], Float64[0,0,0])
4×4 SparseMatrixCSC{Float64, Int32} with 3 stored entries:
  ⋅   0.0  0.0   ⋅
  ⋅    ⋅    ⋅    ⋅
  ⋅    ⋅    ⋅    ⋅
  ⋅    ⋅    ⋅   0.0

julia> J'*J # structural sparsity pattern!
4×4 SparseMatrixCSC{Float64, Int32} with 5 stored entries:
  ⋅    ⋅    ⋅    ⋅
  ⋅   0.0  0.0   ⋅
  ⋅   0.0  0.0   ⋅
  ⋅    ⋅    ⋅   0.0

julia> rows,cols,vals = findnz(J'*J)
(Int32[2, 3, 2, 3, 4], Int32[2, 2, 3, 3, 4], [0.0, 0.0, 0.0, 0.0, 0.0])
2 Likes

But as to the original question as originally asked, the inner product of sparse vectors will always return a scalar (because it’s still smaller and more efficient than a 1x1 empty sparse matrix). But matrices make no such exception, so you can use this:

julia> view(J, :, 1:1)' * view(J, :, 2:2) # N*1 matrices
1×1 SparseMatrixCSC{Float64, Int32} with 0 stored entries:
  ⋅

julia> view(J, :, 1:1)' * view(J, :, 2) # or a matrix-vector product
1-element SparseVector{Float64, Int32} with 0 stored entries

But the built-in multiplication is still very likely to outperform your original function built on this.

Here’s my by-hand implementation:

using SparseArrays
function sparse_mul_At_B!(M::SparseMatrixCSC,
    A::SparseMatrixCSC,
    B::SparseMatrixCSC)
    rows = rowvals(M)
    vals = nonzeros(M)
    n = size(M, 1)
    for colInd in 1:n
        w = @view B[:, colInd]
        for i in nzrange(M, colInd)
            rowInd = rows[i]
            v = @view A[:, rowInd]
            vals[i] = SparseArrays.dot(v, w)
       end
    end
end

Comparing against the library implementation as follows:

using BenchmarkTools
N, d = 100, 0.05
A = SparseMatrixCSC{Float64, Int32}(sprand(N, N, d))
A_old = deepcopy(A)
M = A_old' * A
nonzeros(A) .= rand(nnz(A))
nonzeros(A_old) .= rand(nnz(A_old))
@benchmark sparse_mul_At_B!(M, A_old, A)
@benchmark M = A_old' * A

The results are interesting: the library function is typically 4x faster, but sometimes much much slower. Makes me realize the outsized impact of GC on runtime and the limitations of profiling functions in isolation from each other.


edit: hmm that’s pretty small for profiling. Experimenting with different sizes and densities, my implementation typically has a better worst-case. But I think I’ll probably just use the library one for now. Thanks!