I hate to revive an old thread, but I had a desire for this recently and I wanted something that:
- Does not square the condition number — a lot of the solutions in this thread seem to work by explicitly forming A^T A and minimizing the QP \frac{1}{2} x^T A^T A x - b^T x subject to bound constraints. In contrast, I want something like
A \ b
that uses pivoted QR to avoid exacerbating inaccuracies for ill-conditioned A. - Supports sparse A (via the sparse-QR that
A \ b
uses).
I came up with a 20-line function that uses a Newton-like algorithm of repeatedly updating the active set (similar in spirit to BoundedLeastSquares.jl from above, I think?), and relies only on LinearAlgebra and A \ b
. It handily outperforms all of the generic-optimization packages above (though StructuredOptimization comes close!), is only a bit slower than NNLS.jl (which looks like it squares the condition number, and doesn’t handle upper bounds), and seems to be a few times slower than BoundedLeastSquares.jl primarily because of the cost of QR factorization (to avoid squaring the condition number).
I haven’t yet proved that it must converge, but I haven’t been able to find a counter-example yet — it always seems to converge in ≤ 4 iterations (even for seemingly tricky problems where the unconstrained minimum is close to the bounds in many coordinates). It would be easy enough to add a fallback to something simple like ADMM (while still avoiding squaring the condition number), I suppose. I figured I would post it here in case people are searching this thread.
using LinearAlgebra
"""
lsq_box(A::AbstractMatrix, b::AbstractVector, lo::AbstractVector, hi::AbstractVector)
Return the least-square solution `x̂` minimizing `‖b - Ax‖₂` subject to the
box constraints `lo ≤ x ≤ hi` (elementwise). (An upper/lower bound may
be `±Inf`, respectively, to remove a constraint.)
"""
function lsq_box(A::AbstractMatrix{<:Number}, b::AbstractVector{<:Number}, lo::AbstractVector{<:Number}, hi::AbstractVector{<:Number};
maxiter::Integer=100, rtol::Real=eps(float(eltype(A)))*sqrt(size(A,2)), atol::Real=0)
Base.require_one_based_indexing(A, b, lo, hi)
x = A \ b
@. x = clamp(x, lo, hi)
AᵀA = A'*A
Aᵀb = A'b
g = AᵀA*x; g .-= Aᵀb # gradient ∇ₓ of ½‖b - Ax‖²
inactive = Bool[lo[i] < hi[i] && (x[i] != lo[i] || g[i] ≤ 0) && (x[i] != hi[i] || g[i] ≥ 0) for i in eachindex(x)]
all(inactive) && return x
active = map(!, inactive)
xprev = copy(x)
for iter = 1:maxiter
xa = A[:,inactive] \ (b - A[:,active]*x[active])
x[inactive] = xa
@. x = clamp(x, lo, hi)
g .= mul!(g, AᵀA, x) .- Aᵀb
for i in eachindex(x)
inactive[i] = lo[i] < hi[i] && (x[i] != lo[i] || g[i] ≤ 0) && (x[i] != hi[i] || g[i] ≥ 0)
end
all(i -> inactive[i] == !active[i], eachindex(active)) && return x # convergence: active set unchanged
norm(x - xprev) ≤ max(rtol*norm(x), atol) && return x # convergence: x not changing much
xprev .= x
@. active = !inactive
end
error("convergence failure: $maxiter iterations reached")
end
Maybe I should stick it into a small package, perhaps with a fallback (or better yet, a convergence proof) to be safe?