Solve Large Scale Underdetermined Linear Equation with per Element Equality Constraint

I have the following linear system to solve:

\boldsymbol{A} \boldsymbol{x} = \boldsymbol{0}, \quad \text{subject to} \; {x}_{i} = {v}_{i} \; \forall i \in \mathcal{V}

Where \boldsymbol{A} \in \mathbb{R}^{m \times n} with m < n.
The matrix \boldsymbol{A} is a banded matrix (5-9 bands).
The set \mathcal{V} is a sub set of indices to force equality on a sub set of values of \boldsymbol{x}.

I wonder how to solve this in the most efficient way.

My current approach is to make this a Least Squares Problem:

\begin{align} \arg \min_{\boldsymbol{x}} \quad & \frac{1}{2} \boldsymbol{x}^{T} \boldsymbol{B} \boldsymbol{x} \\ \text{subject to} \quad & \begin{aligned} {x}_{i} & = {v}_{i}, \; \forall i \in \mathcal{V} \\ \end{aligned} \end{align}

Where \boldsymbol{B} = \boldsymbol{A}^{T} \boldsymbol{A} is a Symmetric Positive Semi Definite (SPSD) matrix.

By defining a permutation matrix \boldsymbol{P} one can arrange the elements:

\begin{bmatrix} \boldsymbol{x}_{\mathcal{U}} \\ \boldsymbol{x}_{\mathcal{V}} \end{bmatrix} = \boldsymbol{P} \boldsymbol{x}

Which yields a partitioned formulation of the problem:

\boldsymbol{x}^{T} \boldsymbol{B} \boldsymbol{x} = \boldsymbol{x}^{T} \boldsymbol{P}^{T} \left( \boldsymbol{P} \boldsymbol{B} \boldsymbol{P}^{T} \right) \boldsymbol{P} \boldsymbol{x} = \begin{bmatrix} \boldsymbol{x}_{\mathcal{U}}^{T} & \boldsymbol{x}_{\mathcal{V}}^{T} \end{bmatrix} \begin{bmatrix} \boldsymbol{B}_{\mathcal{U}} & \boldsymbol{C} \\ \boldsymbol{C} & \boldsymbol{B}_{\mathcal{V}} \end{bmatrix} \begin{bmatrix} \boldsymbol{x}_{\mathcal{U}} \\ \boldsymbol{x}_{\mathcal{V}} \end{bmatrix}

Since only \boldsymbol{x}_{\mathcal{U}} is sought, the problem becomes:

\arg \min_{\boldsymbol{x}_{\mathcal{U}}} \frac{1}{2} \boldsymbol{x}_{\mathcal{U}}^{T} \boldsymbol{B}_{\mathcal{U}} \boldsymbol{x}_{\mathcal{U}} + \boldsymbol{x}_{\mathcal{v}}^{T} \boldsymbol{C} \boldsymbol{x}_{\mathcal{U}}

Which is a convex, smooth and constraint free problem.
Its minimization is given where the gradient vanishes:

\boldsymbol{B}_{\mathcal{U}} \hat{\boldsymbol{x}}_{\mathcal{U}} = - \boldsymbol{C} \boldsymbol{x}_{\mathcal{V}}

Which a simple linear system with SPSD matrix.

The advantages of this solution:

  1. Smaller constrained free system.
  2. SPSD system.

The weakness is being built by \boldsymbol{A}^{T} \boldsymbol{A} which increase the condition number.

Is there a better approach?
Usually I’d solve the last equation using the Conjugate Gradient solver in Krylov.jl. Is there an optimized solver for this?

Remark: I also posted the question on StackExchange Computational Science.

Why can’t you just delete A[:, V] and x[V] and solve the smaller system?

3 Likes

@amontoison this one is for you

Hi @RoyiAvital !
Is your problem not equivalent to solve something similar to
[ A; I] [x; y] = [0; v] ?

The block I should be simplified.

Let call E the matrix needed for your problem where some rows of I are removed.
If [A; E] has more rows than columns then use lsmr ohherwise use craigmr.

1 Like

Their value is used for the value of other elements.
If I got your idea right, their value won’t cripple into other elements.

Indeed, the original problem is given by:

\begin{bmatrix} \boldsymbol{A} \\ \boldsymbol{E} \end{bmatrix} \begin{bmatrix} \boldsymbol{x}_{\mathcal{U}} \\ \boldsymbol{x}_{\mathcal{V}} \end{bmatrix} = \begin{bmatrix} 0 \\ \boldsymbol{x}_{\mathcal{V}} \end{bmatrix}

Where:

  • \boldsymbol{E} = \begin{bmatrix} \boldsymbol{0} & \boldsymbol{I} \end{bmatrix}.
  • \mathcal{U} = \left\{ 1, 2, \ldots, n \right\} \setminus \mathcal{V}.

But it is a larger system (n \times n) which is neither symmetric nor PSD. I thought it would be slower to solve.
Does the solvers will be able to take advantage of its structure?
Maybe I need to build a specialized operator?

I just thought in literature there are optimized solvers or approaches for such structure.

@RoyiAvital It will not be slower because optimized Krylov solvers have been developped for rectangular problems (least-squares and least-norm).

lsmr and craigmr are equivalent to minres (in exact arithmetic) on the normal equations AA' x = A'b and AA'y = b. The optimality conditions of the least-squares and least-norm problems.

For information, lsqr and craig and equivalent to cg on the normal equations above.

Note that it’s way more stable than using cg and minres directly on the normal equations.
It’s not recommended to use symmetric solvers on normal equations directly.

You create an operator if you don’t want to store [A; E] and just the blocks A and E. However, you will not gain performance, just storage.

1 Like

OK, I will give it a try.
One delicate thing (At least when using direct solvers), The LS approach will always result in a solution with the equality forced. The solution with \boldsymbol{E}, in case no solution in the range of the model matrix will result in LS solution. So the equality will be approximated.

I will create a short script and share results.

I implemented both methods from above.
The matrix is based on the definition in Build the Affinity Matrix of Edge Preserving Multiscale Image Decomposition based on Local Extrema.
The image size was 250 x 250. The size of of the set \left| \mathcal{V} \right| = 13747.
The results I got, using Julia:

  • Direct Solvers:
    • Direct Solution (Solving the formulation with \boldsymbol{A}):
      • Error: 0 (Will be used as reference).
      • Run Time: 0.14130 [Sec].
    • LS Solution (Solving the formulation with \boldsymbol{B})
      • Error: 4e-12.
      • Run Time: 0.21624 [Sec].
  • Iterative Solvers:
    • LS Solution based on Conjugate Gradient:
      • Error (Max Absolute vs. Direct Solution): 0.000047055766.
      • Run Time: 8.86836 [Sec].
    • Direct Solution based on LSMR:
      • Error (Max Absolute vs. Direct Solution): 0.086388343173.
      • Run Time: 4.27441 [Sec].
    • Direct Solution based on LSQR:
      • Error (Max Absolute vs. Direct Solution): 0.001536492862.
      • Run Time: 7.62469 [Sec].
    • Direct Solution based on CRAIG:
      • Error (Max Absolute vs. Direct Solution): 0.000066854811.
      • Run Time: 9.38470 [Sec].
    • Direct Solution based on CGLS:
      • Error (Max Absolute vs. Direct Solution): 0.000083813606.
      • Run Time: 8.60961 [Sec].

From my point of view, if direct solver is feasible, the direct formulation will be the fastest.
If iterative method is needed, probably go with CG if accuracy is required or LSMR if speed is required.

It might be different with more tweaking, but it seems they all are in the same ballpark.

All iterative solvers are form Krylov.jl with the default settings.
Run time measured by BenchmarkTools.jl using @belapsed.

Remark: It seems most of the time in the LS method is the calculation of \boldsymbol{B} = \boldsymbol{A}^{T} \boldsymbol{A}. Once there is a threaded function for sparse matrix - sparse matrix product it will be faster than the direct method. A PoC using SuiteSparseGraphBLAS.jl showed it is faster. Yet in my case the matrices are SparseMatrixCSC{Float64, Int32} which is not supported in SuiteSparseGraphBLAS.jl. I asked for it in MKLSparse.jl a s well.

A’A is only computed if you use cg on the normal equations, otherwise you should provide A as input.
Did you try with using MKL and using MKLSparse before using Krylov?

I recently updated MKLSparse such that sparse matrix Ă— dense vector products are multithreaded for the both types of integer (Int32 and Int64).

I did provide only \boldsymbol{A}.
I tested all of them using the same environment, so it is fair comparison as MKLSparse.jl will benefit all of them (Iterative methods).

I don’t think it will change the conclusions:

  1. Using Direct Solver: If there is a fast Sparse Matrix - Sparse Matrix multiplication, use the LS (Normal Equations). Otherwise the direct form of the problem.
  2. Using Iterative Solver: For accuracy, use cg(). For speed use lsmr().

Does it surprise you? Want me to check something more?

The code is available on my StackExchange Computational Science GitHub Repository (Look at the ComputationalScience\Q44552 folder).

I might be taking this a step back after you have already received some more specific responses, but I’m not sure I fully understood the problem. Adding some further partitioning, you essentially state that your large system is of the form

\begin{bmatrix} \boldsymbol{A}_{\mathcal{U}} & \boldsymbol{A}_{\mathcal{V}} \\ 0 & \boldsymbol{I} \end{bmatrix} \begin{bmatrix} \boldsymbol{x}_{\mathcal{U}} \\ \boldsymbol{x}_{\mathcal{V}} \end{bmatrix} = \begin{bmatrix} 0 \\ \boldsymbol{x}_{\mathcal{V}} \end{bmatrix},

where the matrix is n\times n. This implies that I is (n-m)\times (n-m) and that \boldsymbol{A}_{\mathcal{U}} is m\times m. If \boldsymbol{A}_{\mathcal{U}} has rank m, then you can set \boldsymbol{x}_{\mathcal{U}} = \boldsymbol{v} for some prescribed \boldsymbol{v} as in your original post. Then you have

\boldsymbol{A}_{\mathcal{U}} \boldsymbol{x}_{\mathcal{U}} = -\boldsymbol{A}_{\mathcal{V}} \boldsymbol{v}

which would be a square nonsingular system. Since \boldsymbol{B}_{\mathcal{U}} = \boldsymbol{A}_{\mathcal{U}}^T \boldsymbol{A}_{\mathcal{U}}, your other formulation would also be positive definite, although I don’t see much reason to form \boldsymbol{B}_{\mathcal{U}} when you can solve the system with \boldsymbol{A}_{\mathcal{U}} directly.

On the other hand, if \boldsymbol{A}_{\mathcal{U}} has rank less than m, either A has rank less than m or you made an unfortunate choice of columns to include in \boldsymbol{A}_{\mathcal{U}}. (I’m not sure if you have any choice here…) So assuming \boldsymbol{A}_{\mathcal{U}} is singular, I am wondering which case it is: Does A have rank less than m or are you just stuck with a bad choice of columns in \boldsymbol{A}_{\mathcal{U}} that you are not free to change? For singular \boldsymbol{A}_{\mathcal{U}} is there some reason to believe a solution exists for your particular choice of \boldsymbol{v}? If it exists are you looking for a minimum norm solution or any solution? If it doesn’t are you looking for a minimum norm least squares solution?

2 Likes

@RoyiAvital this seems to me like the most sensible suggestion here. Delete columns of A corresponding to variables in V and solve a thinner least-squares problem. Why would you iterate on fixed variables? Either use LSMR or Golub/Riley on the eliminated problem.

ps: it doesn’t help to spread your questions across several threads. I did not see this before.

1 Like

This is actually how I solve it in my code for Edge Preserving Multiscale Image Decomposition based on Local Extrema.
In practice in some case those Image Processing problems (Form the pre historic era) you get SPSD matrix to begin with (Like \boldsymbol{B}, The Laplacian of the graph). Yet in this formulation the matrix is not even symmetric. Hence “Pseudo Laplacian”.

I like implementing those papers in my free time.
They are probably useless these days, yet they make the thinking wheels move.

The formulation of the matrix means it is rank deficient.
As rows of \mathcal{U} must have zero sum.
Hence any vector which has constant values is an eigen vector of \boldsymbol{A}_{\mathcal{U}} with eigen value of 0.

I might overlooked this. But unless you do the formulation done for \boldsymbol{B} (Spelled out in @mstewart 's post) I don’t see how deleting can work. You must build a different set of equations. Deleting won’t work as the values are needed.

The other thread is solely on applying Cholesky Decomposition to a rank deficient sparse SPSD matrix. I was wondering how to handle it given pivoting is not available for the sparse case.
It drifted to a similar discussion as people, rightfully, asked for the context.

That is a good explanation. It means your solution won’t be unique. But is the system expected to be consistent? What sort of solution are you looking for? A minimum norm least squares solution?

I think this was probably the same suggestion as mine and I was just a bit more explicit about the details. After I made my post, I looked back and thought I should have pointed out that I wasn’t the first person making the suggestion. If I understand correctly now, you pick \boldsymbol{x}_{\mathcal{V}} and solve for \boldsymbol{x}_{\mathcal{U}} either way. So I’m still not seeing what benefit you get working with \boldsymbol{B} instead of working with \boldsymbol{A}_{\mathcal{U}}.

1 Like

When I did it first I just went with intuition.
Later I just proved it mathematically using permutation matrices (Sub set of the rows of a permutation matrix).
I overlooked on just employing the simple choice and creating the equation as you did.
Indeed, I now think this is what @Oscar_Smith meant. Yet I took it literally as removing the rows.

I guess the only benefit comes to the simple balance when dealing with the LS System.
You may gain some speed in the case m < n solving the normal equations but you pay with accuracy as you square the condition number.
In some cases it worth doing (Classic application is Harris Corner Detection and Optical Flow), in others it might be too dangerous. I guess in real world it depends on the given time budget.

If V is the set of fixed variables (that’s what they are called in optimization) and F is the set of remaining (free) variables, partition WLOG,

A = \begin{bmatrix} A_F & A_V \end{bmatrix}

and x accordingly. Then,

Ax = 0 \Longleftrightarrow A_F x_F = -A_V x_V.

Thus, if A_F is still underdetermined, you can minimize \| x_F \| subject to the eliminated system above.

1 Like

Indeed.
This is exactly the same trick done by me in the code and spelled by @mstewart .
I didn’t understand @Oscar_Smith 's post that way, hence I overlooked it.