What JuMP algorithm to use for this optimization problem?

jump

#1

I have a question about what type of algorithm I should be using for a fairly straightforward optimization problem.

I have the following data

id,val,score,id1
1,9,0,64
2,4,0,7
3,11,12,53
4,7,0,8
5,4,35,11
6,14,0,31
7,13,29,2
8,10,11,4
9,2,18,18
10,5,6,38
...

I want to choose 3 “good” ids and 3 “bad” ids. Without loss of generality, number these 1,2,3 and 4,5,6 respectively. Choose these 6 ids to maximize:

objective = score[1]+score[2]+score[3]-score[4]-score[5]-score[6]

subject to

constraint1: val[1]+val[2]+val[3] >= val[4]+val[5]+val[6]
constraint2: id[1] !in [id1[4],id1[5],id1[6]]
constraint3: id[2] !in [id1[4],id1[5],id1[6]]
constraint4: id[3] !in [id1[4],id1[5],id1[6]]

This seems like something JuMP would very easily handle. I’m just not quite sure where to start. Any help would be greatly appreciated!


Difficulty to understand the JuMP problem setting
#2

Yes, I think JuMP would work very well here. Your problem looks like it could be solved as a mixed-integer linear program (MILP). Solvers like GLPK (free) and Gurobi (commercial) handle problems in this form and work with JuMP.

I would suggest creating two binary variables for each ID: is_good[1:N] and is_bad[1:N]. Your last three constraints are just something like:

is_good[1:N] .+ is_bad[1:N] .<= 1

and your objective function is, roughly:

score[i] * is_good[i] - score[i] * is_bad[i] for i in 1:N

I can provide more details if you need, but hopefully this should at least get you going in the right direction.


#3

Thank you! I’ll give it a shot.


#4

Here’s my first (unsuccessful) attempt at it:

using DataFrames
using HTTP
using uCSV
using JuMP
using GLPK
using Cbc

# read in the data
fname = "https://gist.githubusercontent.com/tyleransom/3738fa6cd1b3b5602e0621b08d068380/raw/5c4b139639d7b3c16716289b0232b8528d1bf494/mwe.csv"
data = DataFrame(uCSV.read(IOBuffer(HTTP.get(fname).body), quotes='"', header=1))
data = Matrix(data)

# id    = data[:,1]
# val   = data[:,2]
# score = data[:,3]
# id1   = data[:,4]

# take a subset of the data just to compare (can figure out by hand on smaller data)
data = data[1:14,:] 

#solver = GLPKSolverLP()
solver = CbcSolver()

# Solve model
function SolveModel(data)
    N = size(data,1) 
    m = Model(solver=solver)

    @variable(m, is_good[1:N], Bin)
    @variable(m,  is_bad[1:N], Bin)

    @objective(m, Max, data[i,3] * is_good[i] - data[i,3] * is_bad[i] for i in 1:N)

    @constraints m begin
        # Constraint 1 - sum of good vals >= sum of bad vals
        data[i,2] * is_good[i] .>= data[i,2] * is_bad[i]
        # Constraint 2 - can't choose the same id for both good and bad
        is_good[1:N] .+ is_bad[1:N] .<= 1
    end

    # Solve it
    status = solve(m)
    println(getvalue(is_good))
    println(getvalue(is_bad))
end

@time SolveModel(data)

Full code in gist from is here. I’m getting an error:

ERROR: LoadError: MethodError: no method matching *(::Float64, ::Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##9#10")){Array{Int64,2},Array{Variable,1},Array{Variable,1}}})
Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...) at operators.jl:502
  *(::Float64, ::Float64) at float.jl:399
  *(::AbstractFloat, ::Bool) at bool.jl:120

Full StackTrace is available here.

I really appreciate your help!


#5
@objective(m, Max, data[i,3] * is_good[i] - data[i,3] * is_bad[i] for i in 1:N)

sum?

data[i,2] * is_good[i] .>= data[i,2] * is_bad[i]

sum(data[i,2] * is_good[i] for i = 1 : N) >= sum(data[i,2] * is_bad[i] for i = 1 : N)?


#6

Thanks! That did the trick.