Suggestions needed for bound-constrained least squares solver

I’m looking for suggestions for a bound-constrained least-squares solver for

minimize_x  1/2||Ax-b||^2  subj to  lb ≤ x ≤ ub

where A is m-by-n and “tall”, ie, m >> n (eg, m=1K, n=10). The solver should have low setup costs, since it will be called consecutively many thousands of times with different data A and b (though all instances of A and b will have the same dimensions).

Optim.jl has a bound constrained solver Fminbox, but it’s a primal barrier method, and not especially efficient.

Are there Julia solvers that implement, say, appropriate projected-gradient or active-set methods?

1 Like

There is https://github.com/rdeits/NNLS.jl. Also if you are willing to go overkill mode, check out JuMP. You can formulate your problem as a conic program and solve it with ProxSDP for example which supports orthant and second order cones.

1 Like

How similar are the problems?

@mpf01 There’s TRON: https://github.com/JuliaSmoothOptimizers/JSOSolvers.jl/blob/master/src/tron.jl. It would be pretty easy to adapt the CG to accommodate a least-squares objective.

3 Likes

NLopt has a variety of methods that can handle bound constraints, but they aren’t specialized for this particular objective.

2 Likes

@dpo Thanks for the pointer TRON. Here’s a quick comparison between TRON (projected second-order descent) vs ECOS (primal-dual interior). The ECOS experiment uses a small custom wrapper around the ECOS.jl solver to translate the box-constrained solver into the second-order cone program

minimize  t  subj to  Ax+r=b, (r,t) in SOC, 0 ≤ x ≤ u.

Here’s the setup:

using JSOSolvers, NLPModels
using ECOSProblems # https://github.com/mpf/ECOSProblems.jl
using BenchmarkTools
using LinearAlgebra

global_logger(NullLogger())
m, n = 10000, 10
A = randn(m, n); x = randn(n); b = A*x;
bl = zeros(n); bu = ones(n)

And here are the benchmarks, which shows a clear advantage to TRON:

julia> @benchmark ECOSProblems.lsbox($A, $b, $bu, verbose=0)
BenchmarkTools.Trial:
  memory estimate:  11.10 MiB
  allocs estimate:  495
  --------------
  minimum time:     250.028 ms (0.00% GC)
  median time:      253.803 ms (0.00% GC)
  mean time:        254.501 ms (0.43% GC)
  maximum time:     260.866 ms (1.62% GC)
  --------------
  samples:          20
  evals/sample:     1

julia> @benchmark tron(LLSModel($A, $b, lvar=$bl, uvar=$bu))
BenchmarkTools.Trial:
  memory estimate:  6.73 MiB
  allocs estimate:  1758
  --------------
  minimum time:     4.248 ms (0.00% GC)
  median time:      5.061 ms (0.00% GC)
  mean time:        5.958 ms (15.21% GC)
  maximum time:     12.798 ms (53.19% GC)
  --------------
  samples:          840

The solutions are close:

julia> xecos = ECOSProblems.lsbox(A, b, bu, verbose=0)[1];

julia> xtron = (tron(LLSModel(A, b, lvar=bl, uvar=bu))).solution;

julia> println(norm(xecos-xtron,Inf))
1.2407727578850336e-6

As @dpo suggested, further improvements are likely possible with some tweaks to the TRON interface.

However, the ECOS experiment probably could be significantly improved by taking advantage of the very few number of columns in A:

  • factor A = QR
  • apply ECOS to the equivalent problem
minimize  t - <A'b,x>  subj to  (||Rx||,t) in rotated SOC, 0 ≤ x ≤ u,

which has a much smaller dual space.

The sequence of LS problems all have the same dimensions, but different data in A and b. It’s possible that the optimal active sets converge.

I recently had to solve a very similar sequence of related QPs.

I found that a combination of LBFGS-B and a heuristic newton-type method works quite well (if the problem is not horribly conditioned). If you have a guess for the active set you can also use a newton-type method that isn’t guaranteed to converge — you can check the KKT conditions and fall back to LBFGS-B if it doesn’t.

You could probably get similar performance out of TRON on the QP version of the problem but I haven’t looked into it…

Here are the results on my machine

using Logging, BenchmarkTools, ECOSProblems, JSOSolvers, NLPModels
using BoundedLeastSquares # from https://github.com/nboyd/BoundedLeastSquares.jl

global_logger(NullLogger())

m, n = 10000, 10
A = randn(m, n); x = randn(n); b = A*x;
bl = zeros(n); bu = ones(n)
Q = form_quadratic_from_least_squares(A, b)
K = sqrt(Q.Q); b_K = K\(Q.b)
x_opt = min_bound_constrained_quadratic(Q,bl,bu)

# From BoundedLeastSquares
@btime min_bound_constrained_quadratic($Q, $bl, $bu) # 29.028 μs (58 allocations: 18.73 KiB)
@btime active_set_min_bound_constrained_quadratic($Q, $bl, $bu) # 3.425 μs (66 allocations: 8.91 KiB)
# Faster if we know the active constraints...
@btime active_set_min_bound_constrained_quadratic($Q, $bl, $bu; mask_l = $(x_opt .== bl), mask_u = $(x_opt .== bu)) # 1.688 μs (32 allocations: 3.67 KiB)

@btime ECOSProblems.lsbox($A, $b, $bu, verbose=0) # 209.546 ms (489 allocations: 11.10 MiB)
@btime tron(LLSModel($A, $b, lvar=$bl, uvar=$bu)) # 12.344 ms (10004 allocations: 35.45 MiB)

## This comparison isn't really fair, try the n by n problem...
@btime tron(LLSModel($K, $b_K, lvar=$bl, uvar=$bu)) # 945.953 μs (7611 allocations: 451.27 KiB)

### Time is dominated by time to form normal equations
@btime form_quadratic_from_least_squares($A, $b) # 155.333 μs (3 allocations: 1.06 KiB)
2 Likes

@nboyd LBFGSB.jl works well, and has minimal overhead. Thanks for the pointer. I really like your Newton-type heuristic — it seems to work often enough that it offers significant savings, even if a fallback to LBFGSB is required.

Does OSQP.jl work? A colleague tested it a year ago, and his experience was that it was some 4-5 times faster than the MATLAB version.

We had similar question in Julia Slack.

This is a simple problem to solve with Projected Gradient Descent.
Once you add some Acceleration Method (Nesterov / FISTA) you get convergence within tens of iterations where each iteration is as simple as you could imagine (Single Matrix Vector multiplication + Vector Clipping).
So it should be competitive with Newton based method (For real world run time, including overhead) yet be much more robust.

I solved it, including code, in StackExchange Mathematics Q3558240 - How to Solve Linear Least Squares Problem with Box Constraints.

I wrote the question and the answer as an answer to the question raised in Julia Slack.

3 Likes

You can also give a try to StructuredOptimization.jl:

using StructuredOptimization

m, n = 10000, 10
A = randn(m,n)
b = randn(m)
x = Variable(n)
lb,ub = -0.01, 0.01

@minimize ls(A*x-b) st x in [lb;ub]

Solvers utilize Proximal Gradient algorithms.

1 Like

I hate to revive an old thread, but I had a desire for this recently and I wanted something that:

  1. Does not square the condition number — a lot of the solutions in this thread seem to work by explicitly forming A^T A and minimizing the QP \frac{1}{2} x^T A^T A x - b^T x subject to bound constraints. In contrast, I want something like A \ b that uses pivoted QR to avoid exacerbating inaccuracies for ill-conditioned A.
  2. Supports sparse A (via the sparse-QR that A \ b uses).

I came up with a 20-line function that uses a Newton-like algorithm of repeatedly updating the active set (similar in spirit to BoundedLeastSquares.jl from above, I think?), and relies only on LinearAlgebra and A \ b. It handily outperforms all of the generic-optimization packages above (though StructuredOptimization comes close!), is only a bit slower than NNLS.jl (which looks like it squares the condition number, and doesn’t handle upper bounds), and seems to be a few times slower than BoundedLeastSquares.jl primarily because of the cost of QR factorization (to avoid squaring the condition number).

I haven’t yet proved that it must converge, but I haven’t been able to find a counter-example yet — it always seems to converge in ≤ 4 iterations (even for seemingly tricky problems where the unconstrained minimum is close to the bounds in many coordinates). It would be easy enough to add a fallback to something simple like ADMM (while still avoiding squaring the condition number), I suppose. I figured I would post it here in case people are searching this thread.

using LinearAlgebra

"""
    lsq_box(A::AbstractMatrix, b::AbstractVector, lo::AbstractVector, hi::AbstractVector)

Return the least-square solution `x̂` minimizing `‖b - Ax‖₂` subject to the
box constraints `lo ≤ x ≤ hi` (elementwise).  (An upper/lower bound may
be `±Inf`, respectively, to remove a constraint.)
"""
function lsq_box(A::AbstractMatrix{<:Number}, b::AbstractVector{<:Number}, lo::AbstractVector{<:Number}, hi::AbstractVector{<:Number};
                 maxiter::Integer=100, rtol::Real=eps(float(eltype(A)))*sqrt(size(A,2)), atol::Real=0)
    Base.require_one_based_indexing(A, b, lo, hi)
    x = A \ b
    @. x = clamp(x, lo, hi)
    AᵀA = A'*A
    Aᵀb = A'b
    g = AᵀA*x; g .-= Aᵀb # gradient ∇ₓ of ½‖b - Ax‖²
    inactive = Bool[lo[i] < hi[i] && (x[i] != lo[i] || g[i] ≤ 0) && (x[i] != hi[i] || g[i] ≥ 0) for i in eachindex(x)]
    all(inactive) && return x
    active = map(!, inactive)
    xprev = copy(x)
    for iter = 1:maxiter
        xa = A[:,inactive] \ (b - A[:,active]*x[active])
        x[inactive] = xa
        @. x = clamp(x, lo, hi)
        g .= mul!(g, AᵀA, x) .- Aᵀb
        for i in eachindex(x)
            inactive[i] = lo[i] < hi[i] && (x[i] != lo[i] || g[i] ≤ 0) && (x[i] != hi[i] || g[i] ≥ 0)
        end
        all(i -> inactive[i] == !active[i], eachindex(active)) && return x # convergence: active set unchanged 
        norm(x - xprev) ≤ max(rtol*norm(x), atol) && return x # convergence: x not changing much
        xprev .= x
        @. active = !inactive
    end
    error("convergence failure: $maxiter iterations reached")
end

Maybe I should stick it into a small package, perhaps with a fallback (or better yet, a convergence proof) to be safe?

21 Likes

Sorry to continue to revive this old thread but I have to say @stevengj your algorithm is very beautiful! I worry because its so pretty, so why is it not in an ancient linear programming textbook? In tests I also have only found rapid convergence.

I modified your code directly to add optimizations appropriate for my use case (many problems with a common A). I amortized the psudoinverse on the complete matrix, so the first pass is fast, later passes tend to have a much reduced space. I preallocate all I can and it really zips.

I only wish it was in a package so that we could test it better. Fortunately, my problems are largely easy (few active constraints!).

The algorithm above looks very much like a special case of Bertsekas’s projected Newton method specialized to the case of a linear least-squares objective. It’s already covered by convergence theory.

Since I wrote above, there is TRONLS, a specialized version of TRON to least-squares objectives. TRON may be seen as a large-scale version of Bertsekas’s method. Solvers · JSOSolvers.jl

There is also a predictor-corrector interior-point method implemented in RipQP: GitHub - JuliaSmoothOptimizers/RipQP.jl. You can feed the problem to RipQP after formulating it as

\begin{align*} \min_{x,r} & \ \tfrac{1}{2} \|r\|^2 \\ \mathrm{subject \ to} & \ Ax + r = b \\ & \ell \leq x \leq u. \end{align*}
7 Likes