I suppose this might be more of a generic MIP formulation question, but I’m not sure stackexchange would accept it and in here I can at least provide some almost working code using JuMP.
Simple context free question is that I have one binary variable array select
and one integer variable array (with lower bound 0) insert
of the same size and I would like to constrain insert
so that insert[i] == 0
for all i > sum(select) + sum(insert)
. Are there any MIP-tricks which can achieve this?
Context which might help find alternate formulations:
NaiveNASlib has functions to modify the structure of a neural network ensure that a) dimensions of parameter arrays stay consistent and b) that the model to the largest extent possible represents the same function as before. To be able to handle any (?) kind of architecture it uses a MIP formulation. I’m looking at making the formulation a bit more generic by loosening up some constraints, but I’m not sure exactly how to go about it.
The current way is to by create variables select
and insert
for each vertex in the computation graph where both select
and insert
are arrays of binary variables. To keep things simple here, assume the parameter array is a matrix, then select[i]
means that we shall keep column i
from that matrix and insert[i]
means that we shall insert a new column after column i
. As a concrete example, the result select = [0, 1, 1, 0, 1]
and insert = [0, 0, 1, 0, 0]
means that the new matrix has four columns where Mnew[:, 1] = Mold[:, 2], Mnew[:, 2] = Mold[:, 3], Mnew[:, 3] .= 0, Mnew[:, 4] = Mold[:, 5]
.
Some extra context which are a little bit besides the question: The way the goals a) and b) are met is by adding constraints based on what category of operation each vertex belongs to. For example, if we hit a vertex V
which does elementwise operation of the inputs, the constraints that select_V .== select_Vi0 .== select_Vi1 ….
where select_V
is the select
variable for the output of V
and select_Vin
is the select
variable for input n
to V
. The same is done for insert
too of course. If a concatenation is hit we instead make constraints for that by making portions of select_V
equal to the corresponding inputs. For both elementwise and concatenation there are typically no parameters to select from V
, but the select_V
variable is used to propagate the constraints to tie everything together (e.g. to handle multiple levels of nested elementwise ops and concatenations).
The first constraint I would like to relax is that currently sum(select) + sum(insert) == length(insert)
. This constraint is a crude way to protect against (what I think are) impossible outcomes, such as “select columns 2,4,5 and insert a new column at position 20” (what should one do with columns 4-19?).
The second constraint I would like to relax is to make insert
an integer variable (with lower bound 0 of course). This would allow the optimizer to insert any number of neurons instead of being artificially restricted to the length of the insert
variable (although I might want to add an upper bound to the sum to prevent models from randomly exploding in size).
The first seems to be the one which is trickier, but the second might add extra complications. To disallow gaps, I would like to have a constraint which says that the total number of selected and inserted must be larger or equal to the largest non-zero index of insert, but how to formulate that?
I have experimented a bit with an auxiliary binary variable array which is constrained to be consecutive and non-zero where both insert
and select
are zero. It might work well enough, but it imposes the unnecessary constraint that when select[i] = 0
then insert[i] > 0
. Example:
import JuMP
import Cbc
import JuMP: @variable, @constraint, @objective, @expression
model = JuMP.Model(JuMP.with_optimizer(Cbc.Optimizer, loglevel=0))
# Which of the existing ten neurons do we want to select
select = JuMP.@variable(model, select[1:10], Bin)
# Max number of neurons we allow to insert (needed both to prevent models from exploding and for big-M strategy below to work)
maxins = 10
# insert[i] = How many new neurons to insert at position i
insert = JuMP.@variable(model, insert[1:10], Int, upper_bound=maxins, lower_bound = 0)
# Number here is specified by user (e.g. I want to resize this layer to only have 8 neurons).
JuMP.@constraint(model, sum(select) + sum(insert) == 8)
# Constrain last (shortened so printout aligns) This should force the optimizer to make sure that select[i] == 0 or insert[i] == 0 is only allowed in the last part, i.e there are no gaps
colast = JuMP.@variable(model, colast[1:10], Bin)
# colast[i] is one if both select[i] or insert[i] is zero using a big M strategy.
JuMP.@constraint(model, [i = 1:10], 2maxins * (1-colast[i]) <= 2maxins * (select[i] + insert[i]))
JuMP.@constraint(model, [i = 1:10], 2maxins * (1-colast[i]) >= select[i] + insert[i])
# colast[i] can also only be one if colast[i-1] is one, iow the last consecutive colast must be one (if any).
JuMP.@constraint(model, [i=2:10], 0 <= colast[i] - colast[i-1] <= 1)
# value[i] = score for select[i], lets make the last neurons more attractive here
value = collect(1:10)
value[6] = -100 # Avoid select[6], colast should force us to insert at this point instead of just skipping it
JuMP.@objective(model, Max, sum(value .* select))
JuMP.optimize!(model)
if JuMP.termination_status(model) == JuMP.MOI.OPTIMAL
@show round.(Int, JuMP.value.(select))
@show round.(Int, JuMP.value.(insert))
@show round.(Int, JuMP.value.(colast))
else
@show JuMP.termination_status(model)
end