Sure! So, we want to solve
\text{minimize} ~ \|Ax - b\|_2^2 + c^Tx
for several vectors b, c and fixed A. There are several ways of approaching this; the one I’ve been using is rather simple: expand out the squared norm to get a convex quadratic and minimize it appropriately, i.e., we want the minimizer of
x^T(A^TA)x - 2(A^Tb - c)^Tx,
which is minimized at
x^\star = (A^TA)^{-1}(A^Tb - c),
assuming that A is full column rank (which I enforce). In that case, computing and caching the Cholesky factorization of A^TA lets us solve the problem repeatedly at a relatively modest computational cost (in an inner loop), especially considering that A is quite sparse. Simple code to do this is
using SuiteSparse
A = ...
max_iter = 1000
A_fact = cholesky(A' * A)
for i=1:max_iter
b, c = ...
x_sol = A_fact \ (A' * b - c)
# Do something with x_sol
end
Of course, computing A' * A
is somewhat expensive (even though it’s only done once, but these are somewhat large matrices) and likely destroys some of the useful sparsity patterns available in A. On the other hand, if we could compute the factorization of A' * A
in a way that exploits its sparsity already (e.g., via COLAMD
) and doesn’t require constructing the Gram matrix, that would be ideal (and we would simply replace A_fact
with whatever the new factorization is). Several papers (as in here, section 3.2) indicate that CHOLMOD
has this capability and I was wondering if it was exposed within the SuiteSparse bindings for Julia.
@stevengj instead suggested that I compute the sparse QR factorization of A and use that in order to solve the least squares problem. There are two ways of going about that, one is given by @jlchan (which would be to use the fact that R^TR = A^TA) and solve the problem based on that idea. A second possibility (the one suggested by @stevengj) is to use the QR factorization of A directly in order to solve two problems: one which is a least-norm problem for finding a solution to A^Td + c = 0, and one for finding a solution to the (equivalent) least squares problem, \min \|Ax - (b + d)\|_2^2.
Because the matrices are sparse and the procedures are somewhat different, it is unclear which of the two (three?) approaches will be faster. (There are tradeoffs between the matrix fill in with each of the possible factorizations, which could heavily change the performance of the back solves.) As far as I know, Cholesky has fairly small fill-in and good performance even when solving normal-type equations (when compared to QR), so I’m biased towards using that (for example) but, really, we won’t know unless we try this out, as all of this is heavily dependent on the sparsity pattern, among many other things.