Suggestions needed for bound-constrained least squares solver

I hate to revive an old thread, but I had a desire for this recently and I wanted something that:

  1. Does not square the condition number — a lot of the solutions in this thread seem to work by explicitly forming A^T A and minimizing the QP \frac{1}{2} x^T A^T A x - b^T x subject to bound constraints. In contrast, I want something like A \ b that uses pivoted QR to avoid exacerbating inaccuracies for ill-conditioned A.
  2. Supports sparse A (via the sparse-QR that A \ b uses).

I came up with a 20-line function that uses a Newton-like algorithm of repeatedly updating the active set (similar in spirit to BoundedLeastSquares.jl from above, I think?), and relies only on LinearAlgebra and A \ b. It handily outperforms all of the generic-optimization packages above (though StructuredOptimization comes close!), is only a bit slower than NNLS.jl (which looks like it squares the condition number, and doesn’t handle upper bounds), and seems to be a few times slower than BoundedLeastSquares.jl primarily because of the cost of QR factorization (to avoid squaring the condition number).

I haven’t yet proved that it must converge, but I haven’t been able to find a counter-example yet — it always seems to converge in ≤ 4 iterations (even for seemingly tricky problems where the unconstrained minimum is close to the bounds in many coordinates). It would be easy enough to add a fallback to something simple like ADMM (while still avoiding squaring the condition number), I suppose. I figured I would post it here in case people are searching this thread.

using LinearAlgebra

"""
    lsq_box(A::AbstractMatrix, b::AbstractVector, lo::AbstractVector, hi::AbstractVector)

Return the least-square solution `x̂` minimizing `‖b - Ax‖₂` subject to the
box constraints `lo ≤ x ≤ hi` (elementwise).  (An upper/lower bound may
be `±Inf`, respectively, to remove a constraint.)
"""
function lsq_box(A::AbstractMatrix{<:Number}, b::AbstractVector{<:Number}, lo::AbstractVector{<:Number}, hi::AbstractVector{<:Number};
                 maxiter::Integer=100, rtol::Real=eps(float(eltype(A)))*sqrt(size(A,2)), atol::Real=0)
    Base.require_one_based_indexing(A, b, lo, hi)
    x = A \ b
    @. x = clamp(x, lo, hi)
    AᵀA = A'*A
    Aᵀb = A'b
    g = AᵀA*x; g .-= Aᵀb # gradient ∇ₓ of ½‖b - Ax‖²
    inactive = Bool[lo[i] < hi[i] && (x[i] != lo[i] || g[i] ≤ 0) && (x[i] != hi[i] || g[i] ≥ 0) for i in eachindex(x)]
    all(inactive) && return x
    active = map(!, inactive)
    xprev = copy(x)
    for iter = 1:maxiter
        xa = A[:,inactive] \ (b - A[:,active]*x[active])
        x[inactive] = xa
        @. x = clamp(x, lo, hi)
        g .= mul!(g, AᵀA, x) .- Aᵀb
        for i in eachindex(x)
            inactive[i] = lo[i] < hi[i] && (x[i] != lo[i] || g[i] ≤ 0) && (x[i] != hi[i] || g[i] ≥ 0)
        end
        all(i -> inactive[i] == !active[i], eachindex(active)) && return x # convergence: active set unchanged 
        norm(x - xprev) ≤ max(rtol*norm(x), atol) && return x # convergence: x not changing much
        xprev .= x
        @. active = !inactive
    end
    error("convergence failure: $maxiter iterations reached")
end

Maybe I should stick it into a small package, perhaps with a fallback (or better yet, a convergence proof) to be safe?

21 Likes