I have an optimisation of the following form.
rows = 6000
cols = 10
# Create model with solver
model = Model(with_optimizer(GLPK.Optimizer))
# Create binary variables
@variable(model, X[1:rows, 1:cols], Bin)
# Objective - maximise V
V = rand(rows, cols)
@objective(model, Max, sum(X .* V));
# Constraint - Each row must only have one entry
for i in 1:rows
@constraint(model, sum(X[i, :]) <= 1)
# Constraint - Each column sums to between min and max
minvols = rand(200:400, rows)
maxvols = rand(600:800, cols)
for j in 1:cols
@constraint(model, minvols[j] <= sum(X[:, j]) <= maxvols[j])
My problem is that when the data set gets much larger - e.g. if
rows is around 500,000 - the solver takes far too long.
Does anyone have any advice on how to approach a problem like this?
rows = 500_000 and
cols = 10, you have 5 million binary variables. That’s a tricky problem.
You could try Cbc.jl, or, if you are an academic, a commercial solver like CPLEX.jl.
I solve problems this big on a regular basis for my job. It is notoriously difficult to predict how long it will take to solve such problems, and they are NP-hard in the general case. We use Gurobi which is excellent, though it is a commercial solver. The open source options are disappointing, but Cbc is probably worth a shot (you may have to do a lot of tweaking of it).
It’s also usually helpful to identify which step is taking a long time: is it the root relaxation, or branch and bound? Sometimes it’s even the generation of cutting planes before the branch and bound step that takes a long time.
Also, just couldn’t help myself, one of the things I love about JuMP and Julia is how easy it is to write efficient problems that look just like they do on paper:
const N = 5*10^5
const M = 10
m = Model()
𝟏 = ones(N)
𝟙 = ones(M)
@variable(m, X[1:N, 1:M], Bin)
@constraint(m, X*𝟙 .≤ 1)
mₗ = rand(200:400, M)
mᵤ = rand(600:800, M)
@constraint(m, mₗ .≤ X'*𝟏 .≤ mᵤ)
V = rand(N, M)
@objective(m, Max, sum(X .* V))
This benefit really adds up when you have big complicated problems, and can easily read them.
Thank you all! much appreciated.
I did find that Cbc was quicker. And I also found that the relaxed problem is much quicker and often equal to the binary problem. I did this by modifying the variables to:
@variable(model, 0 <= X[1:rows, 1:cols] <= 1)
I see this used to be possible pre v0.19 by setting
relaxation=True within the
solve function. And there is a feature request: