Solve Large Scale Underdetermined Linear Equation with per Element Equality Constraint

I implemented both methods from above.
The matrix is based on the definition in Build the Affinity Matrix of Edge Preserving Multiscale Image Decomposition based on Local Extrema.
The image size was 250 x 250. The size of of the set \left| \mathcal{V} \right| = 13747.
The results I got, using Julia:

  • Direct Solvers:
    • Direct Solution (Solving the formulation with \boldsymbol{A}):
      • Error: 0 (Will be used as reference).
      • Run Time: 0.14130 [Sec].
    • LS Solution (Solving the formulation with \boldsymbol{B})
      • Error: 4e-12.
      • Run Time: 0.21624 [Sec].
  • Iterative Solvers:
    • LS Solution based on Conjugate Gradient:
      • Error (Max Absolute vs. Direct Solution): 0.000047055766.
      • Run Time: 8.86836 [Sec].
    • Direct Solution based on LSMR:
      • Error (Max Absolute vs. Direct Solution): 0.086388343173.
      • Run Time: 4.27441 [Sec].
    • Direct Solution based on LSQR:
      • Error (Max Absolute vs. Direct Solution): 0.001536492862.
      • Run Time: 7.62469 [Sec].
    • Direct Solution based on CRAIG:
      • Error (Max Absolute vs. Direct Solution): 0.000066854811.
      • Run Time: 9.38470 [Sec].
    • Direct Solution based on CGLS:
      • Error (Max Absolute vs. Direct Solution): 0.000083813606.
      • Run Time: 8.60961 [Sec].

From my point of view, if direct solver is feasible, the direct formulation will be the fastest.
If iterative method is needed, probably go with CG if accuracy is required or LSMR if speed is required.

It might be different with more tweaking, but it seems they all are in the same ballpark.

All iterative solvers are form Krylov.jl with the default settings.
Run time measured by BenchmarkTools.jl using @belapsed.

Remark: It seems most of the time in the LS method is the calculation of \boldsymbol{B} = \boldsymbol{A}^{T} \boldsymbol{A}. Once there is a threaded function for sparse matrix - sparse matrix product it will be faster than the direct method. A PoC using SuiteSparseGraphBLAS.jl showed it is faster. Yet in my case the matrices are SparseMatrixCSC{Float64, Int32} which is not supported in SuiteSparseGraphBLAS.jl. I asked for it in MKLSparse.jl a s well.