What JuMP algorithm to use for this optimization problem?



I have a question about what type of algorithm I should be using for a fairly straightforward optimization problem.

I have the following data


I want to choose 3 “good” ids and 3 “bad” ids. Without loss of generality, number these 1,2,3 and 4,5,6 respectively. Choose these 6 ids to maximize:

objective = score[1]+score[2]+score[3]-score[4]-score[5]-score[6]

subject to

constraint1: val[1]+val[2]+val[3] >= val[4]+val[5]+val[6]
constraint2: id[1] !in [id1[4],id1[5],id1[6]]
constraint3: id[2] !in [id1[4],id1[5],id1[6]]
constraint4: id[3] !in [id1[4],id1[5],id1[6]]

This seems like something JuMP would very easily handle. I’m just not quite sure where to start. Any help would be greatly appreciated!

Difficulty to understand the JuMP problem setting

Yes, I think JuMP would work very well here. Your problem looks like it could be solved as a mixed-integer linear program (MILP). Solvers like GLPK (free) and Gurobi (commercial) handle problems in this form and work with JuMP.

I would suggest creating two binary variables for each ID: is_good[1:N] and is_bad[1:N]. Your last three constraints are just something like:

is_good[1:N] .+ is_bad[1:N] .<= 1

and your objective function is, roughly:

score[i] * is_good[i] - score[i] * is_bad[i] for i in 1:N

I can provide more details if you need, but hopefully this should at least get you going in the right direction.


Thank you! I’ll give it a shot.


Here’s my first (unsuccessful) attempt at it:

using DataFrames
using HTTP
using uCSV
using JuMP
using GLPK
using Cbc

# read in the data
fname = "https://gist.githubusercontent.com/tyleransom/3738fa6cd1b3b5602e0621b08d068380/raw/5c4b139639d7b3c16716289b0232b8528d1bf494/mwe.csv"
data = DataFrame(uCSV.read(IOBuffer(HTTP.get(fname).body), quotes='"', header=1))
data = Matrix(data)

# id    = data[:,1]
# val   = data[:,2]
# score = data[:,3]
# id1   = data[:,4]

# take a subset of the data just to compare (can figure out by hand on smaller data)
data = data[1:14,:] 

#solver = GLPKSolverLP()
solver = CbcSolver()

# Solve model
function SolveModel(data)
    N = size(data,1) 
    m = Model(solver=solver)

    @variable(m, is_good[1:N], Bin)
    @variable(m,  is_bad[1:N], Bin)

    @objective(m, Max, data[i,3] * is_good[i] - data[i,3] * is_bad[i] for i in 1:N)

    @constraints m begin
        # Constraint 1 - sum of good vals >= sum of bad vals
        data[i,2] * is_good[i] .>= data[i,2] * is_bad[i]
        # Constraint 2 - can't choose the same id for both good and bad
        is_good[1:N] .+ is_bad[1:N] .<= 1

    # Solve it
    status = solve(m)

@time SolveModel(data)

Full code in gist from is here. I’m getting an error:

ERROR: LoadError: MethodError: no method matching *(::Float64, ::Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##9#10")){Array{Int64,2},Array{Variable,1},Array{Variable,1}}})
Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...) at operators.jl:502
  *(::Float64, ::Float64) at float.jl:399
  *(::AbstractFloat, ::Bool) at bool.jl:120

Full StackTrace is available here.

I really appreciate your help!

@objective(m, Max, data[i,3] * is_good[i] - data[i,3] * is_bad[i] for i in 1:N)


data[i,2] * is_good[i] .>= data[i,2] * is_bad[i]

sum(data[i,2] * is_good[i] for i = 1 : N) >= sum(data[i,2] * is_bad[i] for i = 1 : N)?


Thanks! That did the trick.