What JuMP algorithm to use for this optimization problem?

I have a question about what type of algorithm I should be using for a fairly straightforward optimization problem.

I have the following data

``````id,val,score,id1
1,9,0,64
2,4,0,7
3,11,12,53
4,7,0,8
5,4,35,11
6,14,0,31
7,13,29,2
8,10,11,4
9,2,18,18
10,5,6,38
...
``````

I want to choose 3 “good” `id`s and 3 “bad” `id`s. Without loss of generality, number these 1,2,3 and 4,5,6 respectively. Choose these 6 `id`s to maximize:

`objective = score[1]+score[2]+score[3]-score[4]-score[5]-score[6]`

subject to

``````constraint1: val[1]+val[2]+val[3] >= val[4]+val[5]+val[6]
constraint2: id[1] !in [id1[4],id1[5],id1[6]]
constraint3: id[2] !in [id1[4],id1[5],id1[6]]
constraint4: id[3] !in [id1[4],id1[5],id1[6]]
``````

This seems like something JuMP would very easily handle. I’m just not quite sure where to start. Any help would be greatly appreciated!

Yes, I think JuMP would work very well here. Your problem looks like it could be solved as a mixed-integer linear program (MILP). Solvers like GLPK (free) and Gurobi (commercial) handle problems in this form and work with JuMP.

I would suggest creating two binary variables for each ID: `is_good[1:N]` and `is_bad[1:N]`. Your last three constraints are just something like:

``````is_good[1:N] .+ is_bad[1:N] .<= 1
``````

and your objective function is, roughly:

``````score[i] * is_good[i] - score[i] * is_bad[i] for i in 1:N
``````

I can provide more details if you need, but hopefully this should at least get you going in the right direction.

4 Likes

Thank you! I’ll give it a shot.

Here’s my first (unsuccessful) attempt at it:

``````using DataFrames
using HTTP
using uCSV
using JuMP
using GLPK
using Cbc

# read in the data
fname = "https://gist.githubusercontent.com/tyleransom/3738fa6cd1b3b5602e0621b08d068380/raw/5c4b139639d7b3c16716289b0232b8528d1bf494/mwe.csv"
data = DataFrame(uCSV.read(IOBuffer(HTTP.get(fname).body), quotes='"', header=1))
data = Matrix(data)

# id    = data[:,1]
# val   = data[:,2]
# score = data[:,3]
# id1   = data[:,4]

# take a subset of the data just to compare (can figure out by hand on smaller data)
data = data[1:14,:]

#solver = GLPKSolverLP()
solver = CbcSolver()

# Solve model
function SolveModel(data)
N = size(data,1)
m = Model(solver=solver)

@variable(m, is_good[1:N], Bin)
@variable(m,  is_bad[1:N], Bin)

@objective(m, Max, data[i,3] * is_good[i] - data[i,3] * is_bad[i] for i in 1:N)

@constraints m begin
# Constraint 1 - sum of good vals >= sum of bad vals
data[i,2] * is_good[i] .>= data[i,2] * is_bad[i]
# Constraint 2 - can't choose the same id for both good and bad
is_good[1:N] .+ is_bad[1:N] .<= 1
end

# Solve it
status = solve(m)
println(getvalue(is_good))
println(getvalue(is_bad))
end

@time SolveModel(data)
``````

Full code in gist from is here. I’m getting an error:

``````ERROR: LoadError: MethodError: no method matching *(::Float64, ::Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##9#10")){Array{Int64,2},Array{Variable,1},Array{Variable,1}}})
Closest candidates are:
*(::Any, ::Any, ::Any, ::Any...) at operators.jl:502
*(::Float64, ::Float64) at float.jl:399
*(::AbstractFloat, ::Bool) at bool.jl:120
``````

Full StackTrace is available here.

I really appreciate your help!

``````@objective(m, Max, data[i,3] * is_good[i] - data[i,3] * is_bad[i] for i in 1:N)
``````

`sum`?

``````data[i,2] * is_good[i] .>= data[i,2] * is_bad[i]
``````

`sum(data[i,2] * is_good[i] for i = 1 : N) >= sum(data[i,2] * is_bad[i] for i = 1 : N)`?

2 Likes

Thanks! That did the trick.