Suggestions needed for bound-constrained least squares solver

I’m looking for suggestions for a bound-constrained least-squares solver for

minimize_x  1/2||Ax-b||^2  subj to  lb ≤ x ≤ ub

where A is m-by-n and “tall”, ie, m >> n (eg, m=1K, n=10). The solver should have low setup costs, since it will be called consecutively many thousands of times with different data A and b (though all instances of A and b will have the same dimensions).

Optim.jl has a bound constrained solver Fminbox, but it’s a primal barrier method, and not especially efficient.

Are there Julia solvers that implement, say, appropriate projected-gradient or active-set methods?

There is Also if you are willing to go overkill mode, check out JuMP. You can formulate your problem as a conic program and solve it with ProxSDP for example which supports orthant and second order cones.

1 Like

How similar are the problems?

@mpf01 There’s TRON: It would be pretty easy to adapt the CG to accommodate a least-squares objective.


NLopt has a variety of methods that can handle bound constraints, but they aren’t specialized for this particular objective.


@dpo Thanks for the pointer TRON. Here’s a quick comparison between TRON (projected second-order descent) vs ECOS (primal-dual interior). The ECOS experiment uses a small custom wrapper around the ECOS.jl solver to translate the box-constrained solver into the second-order cone program

minimize  t  subj to  Ax+r=b, (r,t) in SOC, 0 ≤ x ≤ u.

Here’s the setup:

using JSOSolvers, NLPModels
using ECOSProblems #
using BenchmarkTools
using LinearAlgebra

m, n = 10000, 10
A = randn(m, n); x = randn(n); b = A*x;
bl = zeros(n); bu = ones(n)

And here are the benchmarks, which shows a clear advantage to TRON:

julia> @benchmark ECOSProblems.lsbox($A, $b, $bu, verbose=0)
  memory estimate:  11.10 MiB
  allocs estimate:  495
  minimum time:     250.028 ms (0.00% GC)
  median time:      253.803 ms (0.00% GC)
  mean time:        254.501 ms (0.43% GC)
  maximum time:     260.866 ms (1.62% GC)
  samples:          20
  evals/sample:     1

julia> @benchmark tron(LLSModel($A, $b, lvar=$bl, uvar=$bu))
  memory estimate:  6.73 MiB
  allocs estimate:  1758
  minimum time:     4.248 ms (0.00% GC)
  median time:      5.061 ms (0.00% GC)
  mean time:        5.958 ms (15.21% GC)
  maximum time:     12.798 ms (53.19% GC)
  samples:          840

The solutions are close:

julia> xecos = ECOSProblems.lsbox(A, b, bu, verbose=0)[1];

julia> xtron = (tron(LLSModel(A, b, lvar=bl, uvar=bu))).solution;

julia> println(norm(xecos-xtron,Inf))

As @dpo suggested, further improvements are likely possible with some tweaks to the TRON interface.

However, the ECOS experiment probably could be significantly improved by taking advantage of the very few number of columns in A:

  • factor A = QR
  • apply ECOS to the equivalent problem
minimize  t - <A'b,x>  subj to  (||Rx||,t) in rotated SOC, 0 ≤ x ≤ u,

which has a much smaller dual space.

The sequence of LS problems all have the same dimensions, but different data in A and b. It’s possible that the optimal active sets converge.

I recently had to solve a very similar sequence of related QPs.

I found that a combination of LBFGS-B and a heuristic newton-type method works quite well (if the problem is not horribly conditioned). If you have a guess for the active set you can also use a newton-type method that isn’t guaranteed to converge — you can check the KKT conditions and fall back to LBFGS-B if it doesn’t.

You could probably get similar performance out of TRON on the QP version of the problem but I haven’t looked into it…

Here are the results on my machine

using Logging, BenchmarkTools, ECOSProblems, JSOSolvers, NLPModels
using BoundedLeastSquares # from


m, n = 10000, 10
A = randn(m, n); x = randn(n); b = A*x;
bl = zeros(n); bu = ones(n)
Q = form_quadratic_from_least_squares(A, b)
K = sqrt(Q.Q); b_K = K\(Q.b)
x_opt = min_bound_constrained_quadratic(Q,bl,bu)

# From BoundedLeastSquares
@btime min_bound_constrained_quadratic($Q, $bl, $bu) # 29.028 μs (58 allocations: 18.73 KiB)
@btime active_set_min_bound_constrained_quadratic($Q, $bl, $bu) # 3.425 μs (66 allocations: 8.91 KiB)
# Faster if we know the active constraints...
@btime active_set_min_bound_constrained_quadratic($Q, $bl, $bu; mask_l = $(x_opt .== bl), mask_u = $(x_opt .== bu)) # 1.688 μs (32 allocations: 3.67 KiB)

@btime ECOSProblems.lsbox($A, $b, $bu, verbose=0) # 209.546 ms (489 allocations: 11.10 MiB)
@btime tron(LLSModel($A, $b, lvar=$bl, uvar=$bu)) # 12.344 ms (10004 allocations: 35.45 MiB)

## This comparison isn't really fair, try the n by n problem...
@btime tron(LLSModel($K, $b_K, lvar=$bl, uvar=$bu)) # 945.953 μs (7611 allocations: 451.27 KiB)

### Time is dominated by time to form normal equations
@btime form_quadratic_from_least_squares($A, $b) # 155.333 μs (3 allocations: 1.06 KiB)

@nboyd LBFGSB.jl works well, and has minimal overhead. Thanks for the pointer. I really like your Newton-type heuristic — it seems to work often enough that it offers significant savings, even if a fallback to LBFGSB is required.

Does OSQP.jl work? A colleague tested it a year ago, and his experience was that it was some 4-5 times faster than the MATLAB version.

We had similar question in Julia Slack.

This is a simple problem to solve with Projected Gradient Descent.
Once you add some Acceleration Method (Nesterov / FISTA) you get convergence within tens of iterations where each iteration is as simple as you could imagine (Single Matrix Vector multiplication + Vector Clipping).
So it should be competitive with Newton based method (For real world run time, including overhead) yet be much more robust.

I solved it, including code, in StackExchange Mathematics Q3558240 - How to Solve Linear Least Squares Problem with Box Constraints.

I wrote the question and the answer as an answer to the question raised in Julia Slack.


You can also give a try to StructuredOptimization.jl:

using StructuredOptimization

m, n = 10000, 10
A = randn(m,n)
b = randn(m)
x = Variable(n)
lb,ub = -0.01, 0.01

@minimize ls(A*x-b) st x in [lb;ub]

Solvers utilize Proximal Gradient algorithms.