Help with constraints to select and/or insert columns to a matrix

I suppose this might be more of a generic MIP formulation question, but I’m not sure stackexchange would accept it and in here I can at least provide some almost working code using JuMP.

Simple context free question is that I have one binary variable array select and one integer variable array (with lower bound 0) insert of the same size and I would like to constrain insert so that insert[i] == 0 for all i > sum(select) + sum(insert). Are there any MIP-tricks which can achieve this?

Context which might help find alternate formulations:

NaiveNASlib has functions to modify the structure of a neural network ensure that a) dimensions of parameter arrays stay consistent and b) that the model to the largest extent possible represents the same function as before. To be able to handle any (?) kind of architecture it uses a MIP formulation. I’m looking at making the formulation a bit more generic by loosening up some constraints, but I’m not sure exactly how to go about it.

The current way is to by create variables select and insert for each vertex in the computation graph where both select and insert are arrays of binary variables. To keep things simple here, assume the parameter array is a matrix, then select[i] means that we shall keep column i from that matrix and insert[i] means that we shall insert a new column after column i. As a concrete example, the result select = [0, 1, 1, 0, 1] and insert = [0, 0, 1, 0, 0] means that the new matrix has four columns where Mnew[:, 1] = Mold[:, 2], Mnew[:, 2] = Mold[:, 3], Mnew[:, 3] .= 0, Mnew[:, 4] = Mold[:, 5].

Some extra context which are a little bit besides the question: The way the goals a) and b) are met is by adding constraints based on what category of operation each vertex belongs to. For example, if we hit a vertex V which does elementwise operation of the inputs, the constraints that select_V .== select_Vi0 .== select_Vi1 …. where select_V is the select variable for the output of V and select_Vin is the select variable for input n to V. The same is done for insert too of course. If a concatenation is hit we instead make constraints for that by making portions of select_V equal to the corresponding inputs. For both elementwise and concatenation there are typically no parameters to select from V, but the select_V variable is used to propagate the constraints to tie everything together (e.g. to handle multiple levels of nested elementwise ops and concatenations).

The first constraint I would like to relax is that currently sum(select) + sum(insert) == length(insert). This constraint is a crude way to protect against (what I think are) impossible outcomes, such as “select columns 2,4,5 and insert a new column at position 20” (what should one do with columns 4-19?).
The second constraint I would like to relax is to make insert an integer variable (with lower bound 0 of course). This would allow the optimizer to insert any number of neurons instead of being artificially restricted to the length of the insert variable (although I might want to add an upper bound to the sum to prevent models from randomly exploding in size).

The first seems to be the one which is trickier, but the second might add extra complications. To disallow gaps, I would like to have a constraint which says that the total number of selected and inserted must be larger or equal to the largest non-zero index of insert, but how to formulate that?

I have experimented a bit with an auxiliary binary variable array which is constrained to be consecutive and non-zero where both insert and select are zero. It might work well enough, but it imposes the unnecessary constraint that when select[i] = 0 then insert[i] > 0. Example:

        import JuMP
        import Cbc
        import JuMP: @variable, @constraint, @objective, @expression

        model =  JuMP.Model(JuMP.with_optimizer(Cbc.Optimizer, loglevel=0))

        # Which of the existing ten neurons do we want to select
        select = JuMP.@variable(model, select[1:10], Bin)
        
        # Max number of neurons we allow to insert (needed both to prevent models from exploding and for big-M strategy below to work) 
        maxins = 10
        # insert[i] = How many new neurons to insert at position i
        insert = JuMP.@variable(model, insert[1:10], Int, upper_bound=maxins, lower_bound = 0)

        # Number here is specified by user (e.g. I want to resize this layer to only have 8 neurons).
        JuMP.@constraint(model, sum(select) + sum(insert) == 8)

        # Constrain last (shortened so printout aligns) This should force the optimizer to make sure that select[i] == 0 or insert[i] == 0 is only allowed in the last part, i.e there are no gaps
        colast = JuMP.@variable(model, colast[1:10], Bin)
        # colast[i] is one if both select[i] or insert[i] is zero using a big M strategy.
        JuMP.@constraint(model, [i = 1:10], 2maxins * (1-colast[i]) <= 2maxins * (select[i] + insert[i]))
        JuMP.@constraint(model, [i = 1:10], 2maxins * (1-colast[i]) >= select[i] + insert[i])
        # colast[i] can also only be one if colast[i-1] is one, iow the last consecutive colast must be one (if any).
        JuMP.@constraint(model, [i=2:10], 0 <= colast[i] - colast[i-1] <= 1)

        # value[i] = score for select[i], lets make the last neurons more attractive here
        value = collect(1:10)
        value[6] = -100 # Avoid select[6], colast should force us to insert at this point instead of just skipping it
        JuMP.@objective(model, Max, sum(value .* select))

        JuMP.optimize!(model)

        if JuMP.termination_status(model)  == JuMP.MOI.OPTIMAL
            @show round.(Int, JuMP.value.(select))
            @show round.(Int, JuMP.value.(insert))
            @show round.(Int, JuMP.value.(colast))
        else
          @show  JuMP.termination_status(model)
        end

This is one step closer. Perhaps it is the right solution?

        import JuMP
        import Cbc
        import JuMP: @variable, @constraint, @objective, @expression

        model =  JuMP.Model(JuMP.with_optimizer(Cbc.Optimizer, loglevel=0))

        # Which of the existing neurons do we want to select
        select = JuMP.@variable(model, select[1:10], Bin)
        
        # Max number of neurons we allow to insert (needed both to prevent models from exploding and for big-M strategy below to work) 
        maxins = 10
        # insert[i] = How many new neurons to insert at position i
        insert = JuMP.@variable(model, insert[1:10], Int, upper_bound=maxins, lower_bound = 0)

        # Number here is specified by user (e.g. I want to resize this layer to only have 8 neurons).
        JuMP.@constraint(model, sum(select) + sum(insert) == 7)

        # Constrain last (shortened so printout aligns) This should force the optimizer to make sure that insert[i] == 0 is only allowed in the last part, i.e there are no gaps
        colast = JuMP.@variable(model, colast[1:10], Bin)
        # colast[i] is zero if insert[i] is one using a big M strategy.
        JuMP.@constraint(model, [i = 1:10], 2maxins * (1-colast[i]) >= insert[i])
        
        # colast[i] can also only be one if colast[i-1] is one, iow the last consecutive colast must be one (if any).
        JuMP.@constraint(model, [i=2:10], 0 <= colast[i] - colast[i-1] <= 1)
        # This should block insert from being > 0 for the last sum(select) indices
        JuMP.@constraint(model, 10-sum(select) <= sum(colast))


        # value[i] = score for select[i], lets make the last neurons more attractive here
        value = collect(1:10)
        value[3:7] .= -100 # Avoid select[3:7], colast should force us to insert after position 5
        insval = collect(1:10) .* 0.1 # Just to make sure we don't get the right answer by accident
        JuMP.@objective(model, Max, sum(value .* select) + sum(insval .* insert))

        JuMP.optimize!(model)

        if JuMP.termination_status(model)  == JuMP.MOI.OPTIMAL
            @show round.(Int, JuMP.value.(select))
            @show round.(Int, JuMP.value.(insert))
            @show round.(Int, JuMP.value.(colast))
        else
          @show  JuMP.termination_status(model)
        end

Perhaps it is the right solution?

Does your formulation provide sensible results? Or is there something wrong with it?

I haven’t looked at your question in detail, but to the “simple context free” part of your question:

model = Model()
@variable(model, select[1:10], Bin)
@variable(model, 0 <= insert[1:10] <= 10, Int)
@variable(model, insert_zero[1:10], Bin)

# insert[i] == 0 if insert_zero[i] == 1
@constraint(model, [i=1:10], insert[i] <= 10 * insert_zero[i])
# Monotonicity of insert_zero
@constraint(model, [i=2:10], insert_zero[i] <= insert_zero[i-1])
#   i > sum(select) + sum(insert) ⟹ insert_zero[i] == 0
# ⟺
#   sum(select) + sum(insert) == sum(insert_zero)
@constraint(model, sum(select) + sum(insert) == sum(insert_zero))

A few JuMP/style points:

model = JuMP.Model(JuMP.with_optimizer(Cbc.Optimizer, loglevel=0))

You should update to the latest version of JuMP.

model =  JuMP.Model(Cbc.Optimizer)
JuMP.set_silent(model)

select = JuMP.@variable(model, select[1:10], Bin)

No need for select = . JuMP.@variable(model, select[1:10], Bin) is the same by itself.

JuMP.@variable(model, insert[1:10], Int, upper_bound=maxins, lower_bound = 0)

JuMP.@variable(model, 0 <= insert[1:10] <= maxins, Int)

JuMP.@constraint(model, [i=2:10], 0 <= colast[i] - colast[i-1] <= 1)

This simplifies to

JuMP.@constraint(model, [i=2:10], colast[i] >= colast[i-1])
2 Likes

Thanks alot!

Does your formulation provide sensible results? Or is there something wrong with it?

It does indeed seem to provide sensible results (assuming I know what is sensible, which might not be true :slight_smile: ).

I became a bit paranoid after reading some blog post where a seemingly straighforward forumlation was orders of magnitude slower compared to a more convoluted formulation with more variables and constraints. The message was something along the lines of “you need to think like an evil solver which will do everything in its power to explore infeasible paths if you allow it”. Since the formulation in this case is created programatically based on arbitrary input and I don’t trust my ability to think like an evil solver I was afraid it might not scale up well due to being non-ideomatic.

Thanks for the pointers on JuMP style too. Very much appreciated!

No need for select = . JuMP.@variable(model, select[1:10], Bin) is the same by itself.

In the real code the number of select/insert/insert_zero variables depends on user input (the neural network structure) so I’m using anonymous variables in Dicts which I’m passing around to a couple of “add appropriate constraints for this vertex”-functions. I suppose I ended up with a strange mix in the MWE. If there was some way to register some key=>variable mapping (for an arbitrary key-type) in the model I could get rid of the Dicts and just pass around the model.

Btw, JuMP is really awesome software for bringing mathematical programming to the hands of people who just want to solve some problem with a computer (like me in this case). Without it I would probably have abandoned NaiveNASlib when I realized I couldn’t get my initial (non-MIP) attempt to be robust enough. Big thanks to you and everyone else who maintains/develops it!

1 Like