JuMP: Please recommend solver

My objective function is a variant of ordinary linear least squares, by which I mean estimating regression coefficients B[1],…,B[p] by minimizing

sum((Y[i]-sum( X[i,j]*B[j] for j=1:p))^2 for i=1:n)

where Y[1],…,Y[n] are n observations and X is an n by p regression matrix.

My problem is a variant on the above by introducing binary decision variables G[1],…,G[p]

sum((Y[i]-sum( X[i,j]*B[j]*G[j] for j=1:p))^2 for i=1:n)

The difference is that each regression coefficient B[i] is preceded by a binary 0-1 decision variable G[i].

I believe this problem to fall into the MINLP category, and I have referenced the list of solvers which support this type of optimization, but have not been able to get it to work so far.

I have set up my model as follows

m = Model(solver=some_solver())
@variable(m, G[1:p], Bin)
@variable(m, B[1:p])
@NLobjective(m, Min, sum((Y[i]-sum( X[i,j]*B[j]*G[j] for j=1:p))^2 for i=1:n))
solve(m)

where I have experimented with various values of some_solver() but have always received some error message.

Can someone please advise me if I have correctly identified this as an MINLP problem and if so advise me what solver would work well? Also please advise if I have implemented it correctly.

Much appreciated!

What are the solvers and error messages?

You can actually represent this as a mixed-integer quadratic program, which is a vastly easier class of problem to sovle than a mixed-integer general nonlinear program. Gurobi.jl is an excellent choice for solving MIQPs, but Mosek and CPLEX are also very good.

The way you do this is to introduce a new variable C and use linear constraints to encode that C[j] == B[j] if G[j] == 1, otherwise C[j] == 0. That gives the same result as C[j] == B[j] * G[j] for binary G[j], but uses only linear constraints.

Here’s a concrete example:

p = 2
n = 3
Y = rand(n)
X = rand(n, p)

B_lb = -10
B_ub = 10

m = Model(solver=GurobiSolver())
@variable(m, G[1:p], Bin)
@variable(m, B_lb <= B[1:p] <= B_ub)
@variable(m, C[1:p])

# Constrain that when G[j] == 1, C[j] must equal B[j]
@constraint(m, C .- B .<= B_ub .* (1 .- G))
@constraint(m, C .- B .>= B_lb .* (1 .- G))

# Constrain that when G[j] == 0, C[j] must equal zero
@constraint(m, C .<= B_ub .* G)
@constraint(m, C .>= B_lb .* G)

@objective(m, Min, sum((Y[i]-sum( X[i,j]*C[j] for j=1:p))^2 for i=1:n))
solve(m)
3 Likes

@rdeits ah this is a really neat trick, and thank you for the example! :smiley: I do not understand how the upper and lower bounds on B at a value of 10 are motivated. I think I’ve seen this before somewhere but I forget the reference, would you be able to provide me a reference so I can understand how this reformulation works? That’d be great!

@joehuchette My mistake, I played around with Baron a little longer and realized that I hadn’t set it up on my machine correctly. It si now working using the above code:) thank you!

You’re welcome! Having to choose bounds for B is an unfortunate necessity of this trick, but for most problems it’s possible to derive reasonable upper and lower bounds. Note that the bounds don’t need to be exact, they just need to be “big enough” to hold all the values of B you’d expect.

There’s some more information here: https://yalmip.github.io/tutorial/bigmandconvexhulls/

@rdeits I am wondering if it is necessary to implement the auxilliary variable C. I have compared two formulations.

#MODEL 1
m = Model(solver=GurobiSolver())
@variable(m, G[1:p], Bin)
@variable(m, B[1:p])
@constraint(m,  B .<=  B_ub * G)
@constraint(m,  B_lb * G .<= B)
@objective(m, Min,  sum((Y[i]-sum( X[i,j]*B[j] for j=1:p))^2 for i=1:n))
solve(m)
#MODEL 2
m = Model(solver=GurobiSolver())
@variable(m, G[1:p], Bin)
@variable(m, B[1:p])
@variable(m, C[1:p])
# Constrain that when G[j] == 1, C[j] must equal B[j]
@constraint(m, C .- B .<= B_ub .* (1 .- G))
@constraint(m, C .- B .>= B_lb .* (1 .- G))
# Constrain that when G[j] == 0, C[j] must equal zero
@constraint(m, C .<= B_ub .* G)
@constraint(m, C .>= B_lb .* G)
@objective(m, Min,  sum((Y[i]-sum( X[i,j]*C[j] for j=1:p))^2 for i=1:n))
solve(m)

The first model doesn’t introduce the auxilliary variable C. Is there any reason that I am missing for why you introduce C?

Nope, your simpler formulation looks just fine. I was just overcomplicating the situation.

Ok, just wanted to check and be sure :slight_smile: Thank you for your original suggestion.