Differences in A \ b for sparse and nonsparse rank-deficient A

I have a question regarding the operator \ for SpareseMatrixCSC. Thanks for your help.

using LinearAlgebra, SparseArrays

Is = [9, 8, 9, 12, 2, 5, 6, 11, 8, 12]
Js = [1, 3, 4, 4, 5, 5, 7, 7, 10, 10]
Vs = [0.046668312772398135, 0.0732172552606527, 0.07228019735547542, 0.27500506619596665, 0.8821260519923395, 0.188438924730004, 0.22883463110234215, 0.17705262933291666, 0.297278962545636, 0.07770904672896628]
A = sparse(Is, Js, Vs)
b = [0.0
 0.26743848063303416
 0.0
 0.0
 0.057129952809003536
 0.20824640373847988
 0.0
 0.1580262484804872
 0.09667497046894227
 0.0
 0.16112322314768995
 0.27723584012581337]

qr(A) \ b ≈ A \ b # True
A \ b ≈ pinv(Matrix(A))*b # False
qr(Matrix(A), Val(true)) \ b ≈ pinv(Matrix(A))*b # True

Is this behavior expected?

Perhaps there is a kwarg in qr(::SparseMatrixCSC) that I can use to get the same result as the dense fully pivoted case and without intermediate conversion of A to dense.

1 Like

You have a 12 \times 10 matrix A of rank 5, so the least-square problem \min_x \Vert Ax - b\Vert_2 (which is what A \ b and your other expressions solve) has infinitely many solutions.

For dense matrices, Julia’s A \ b (along with qr(A) \ b and pinv(A) * b) computes the minimum-norm solution to the least-squares problem.

However, for sparse matrices, Julia’s (SuiteSparse’s) A \ b (and qr(A) \ b) computes the sparsest solution, also called the “basic” solution (the solution with the most zero entries).

In your case, you’ll notice that Matrix(A) \ b has 4 zero entries and norm ≈ 1.57, whereas A \ b has 5 zero entries and norm ≈ 2.62.

This is documented if you look at the documentation for \, for example.

10 Likes

Many thanks! I should have read the documentation very carefully. Could you have a look at the logic of the following? I did some partial testing and it seems to return the correct solution.

# Compute the pseudoinverse solution for least squares problem
function pinvsol(A::SparseMatrixCSC, b::AbstractVector)
    m, n = size(A)
    F = qr(A) # SuiteSparse.SPQR.QRSparse
    rnk = rank(F)
    if rnk == n # full column rank
        return qr(A) \ b
    end
    # rank-deficient
    R11 = F.R[Base.OneTo(rnk),Base.OneTo(rnk)]
    R12 = F.R[Base.OneTo(rnk),rnk+1:end]
    S = sparse(R11 \ R12)
    xb = R11 \ (F.Q[:,Base.OneTo(rnk)]'*b[F.prow])
    p = n - rnk
    x2 = [S; -sparse(I,p,p)] \ [xb; zeros(p)]
    x1 = xb - S*x2
    x = [x1; x2]
    x[F.pcol] = x
    return x # least squares least norm solution for sparse A
end

Unfortunately, this creates a dense matrix. (F.Q is not stored explicitly, but as a composition of sparse or low-rank operations so that you can compute F.Q times a vector efficiently. Trying to pull out a subset of the columns of Q discards this structure.)

Instead, I would compute (F.Q' * b[F.prow])[1:rnk], which should be equivalent and efficient without constructing a dense matrix.

Note that this is also not a sparse matrix in general, since inv(R11) is in general not sparse. In general you want to compute the action of matrices like this on vectors rather than computing the matrix itself. But it may be okay for your purposes since maybe you are interested only in the case where the rank is very low, so a dense R11 matrix inverse is acceptable?

I’ve just skimmed your code looking for common sparse-matrix gotchas; I haven’t really looked at the structure of your algorithm. Note that there are some papers on minimum-norm solutions with sparse matrices that may be helpful; for example, Solution of Sparse Underdetermined Systems of Linear Equations (1984)

See also Finding the least norm solution of least squares problem with a sparse matrix for iterative methods.

4 Likes

Many thanks for your help. The logic of this implementation follows from the book by Bjorck (section 2.7.4).

This is actually an overlook on my part. Generally I would avoid this if possible.

I had a look at the paper you sent and it appears the complete orthogonal decomposition is a good candidate to implement. Could you outline what are the “best” (simplicity vs speed) approaches for this problem that people have come up after the book of Bjorck?