I am implementing a large-scale decomposition method for solving a huge linear optimization problem. At each iteration of the algorithm, I need to efficiently obtain the optimal primal and dual solution to the LP. The algorithm requires many iterations, and the issue is that considerable memory allocation is taking place each time I query the optimal solution to a JuMP model. Below is a minimal working example to illustrate this issue with memory allocation:
using JuMP
using Gurobi
# Generate a random linear program
m = 1000
n = 500
A = rand(m,n)
b = ones(m)
# Preallocate a vector for storing the optimal solution
primal_solution = zeros(n)
# Build the model
model = direct_model(Gurobi.Optimizer())
set_silent(model)
@variable(model, x[j=1:n] >= 0)
@constraint(model, Cons[i=1:m], dot(A[i,:],x) <= b[i])
@objective(model, Max, sum(x[i] for i=1:n))
# Solve the model
optimize!(model)
# Extract the optimal solution (several times)
@time primal_solution .= value.(x)
@time primal_solution .= value.(x)
@time primal_solution .= value.(x)
println("")
The output of the above code is:
0.000371 seconds (3.50 k allocations: 62.531 KiB)
0.000383 seconds (3.50 k allocations: 62.531 KiB)
0.000386 seconds (3.50 k allocations: 62.531 KiB)
Based on the above output, I conclude that memory allocation is taking place in the heap each time we call the function value.(x).
Is there a way to improve the performance of the above code to remove (or significantly reduce) the need for dynamic memory allocation when querying the optimal primal or dual solutions? For example, is there a way to preallocate the memory to be used by JuMP for computing & returning the optimal solutions? My apologies if this question has already been answered elsewhere!
I havenβt tried, but does this allocate?
for j in 1:n
primal_solution[j] = value(x[j])
end
This does not seem to make a difference in memory allocation. I will demonstrate this numerically below.
Have you confirmed that these allocations are a bottleneck in your solution algorithm?
Yes, these allocations are indeed the bottleneck. To see why these allocations are the bottleneck, I have added below a minimal (but more realistic) working example. The below example is a straightforward implementation of delayed column generation for solving a large linear optimization problem.
using JuMP
using Gurobi
using TimerOutputs
# The function takes a parameter to compare the memory performance
# when using the vectorized vs non-vectorized JuMP function for
# extracting the optimal solution.
function main(use_vectorized_update=false)
###########################################################
# Create a TimerOutput, which we will use to keep track of
# the timing and memory usage of various lines of code
###########################################################
to = TimerOutput()
###########################################################
# Generate a random, sparse linear optimization problem
###########################################################
@timeit to "Generate a random, sparse linear optimization problem" begin
# Set the number of constraints and variables of the master problem
m = 3000
n = 6000
# Create a random, sparse constraint matrix
A = zeros(m,n)
for j=1:n
# Randomly select two entries of the j-th column to be nonzero.
entries = rand(1:m, 2)
# Assign random coefficients to those two entries
A[entries[1],j] = rand()
A[entries[2],j] = rand()
end
# Create the right-hand side vector and cost vector.
b = ones(m)
c = ones(n)
end
###########################################################
# Construct a restricted version of the master problem
###########################################################
@timeit to "Initialize the restricted master problem (RMP)" begin
# Preallocate a vector for storing the optimal dual solution
dual_solution = zeros(m)
# Create a variable to store the number of columns which are
# in the restricted master problem.
k=1
# Build the restricted master problem.
model = direct_model(Gurobi.Optimizer())
set_silent(model)
@variable(model, 0 β€ x[j=1:k] β€ 1)
@constraint(model, Cons[i=1:m], sum(A[i,j]*x[j] for j=1:k) <= b[i])
@objective(model, Max, sum(c[j]*x[j] for j=1:k))
# Since we are doing column generation, we will tell Gurobi to use
# the primal simplex method.
set_optimizer_attribute(model, "Method", 0)
end
###########################################################
# In each iteration of the algorithm, we solve the restricted
# master problem (RMP), extract the optimal dual solution to
# RMP, use the optimal dual solution to find a column with
# negative reduced cost to add to RMP, and add that column
# to RMP.
###########################################################
while true
# Terminate if we have added all of the columns
if k == n
break
end
# Solve the restricted master problem
@timeit to "Solve restricted master problem using primal simplex" optimize!(model)
# Extract the optimal dual solution to the restricted master problem
if use_vectorized_update
@timeit to "Extract the optimal dual solution (vectorized)" dual_solution .= dual.(Cons)
else
@timeit to "Extract the optimal dual solution (non-vectorized)" begin
for i=1:m
dual_solution[i] = dual(Cons[i])
end
end
end
# Using the dual solution, we would normally search for a column
# (i.e., primal variable) which has negative reduced cost.
# However, for the sake of simplifying this example, below we will
# arbitrarily choose the next (i.e., k+1th) column and add it
# to the restricted master problem.
@timeit to "Add new column to restricted master problem" begin
# Update our counter for the number of columns in the restricted
# master problem.
k += 1
# Add the new column to the restricted master problem.
new_var = @variable(model,
[k],
base_name = "x",
lower_bound = 0,
upper_bound = 1)
# Update the objective function of the restricted master problem.
set_objective_coefficient(model,
new_var[k],
c[k])
# Update the constraints of the restricted master problem
for i=1:m
# To improve the memory performance, only update the i-th
# constraint if the coefficient A[i,k] is nonzero.
if A[i,m] != 0
set_normalized_coefficient(Cons[i],
new_var[k],
A[i,k])
end
end
end
end
###########################################################
# Display the performance of the algorithm.
###########################################################
show(to)
end
Below is the output of the above code when we use the non-vectorized function (i.e., use_vectorized_update=false):
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Time Allocations
ββββββββββββββββββββββ βββββββββββββββββββββββ
Tot / % measured: 23.0s / 100% 4.31GiB / 100%
Section ncalls time %tot avg alloc %tot avg
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Extract the optimal dual solution (non-vectorized) 6.00k 16.4s 71.4% 2.73ms 3.39GiB 78.7% 593KiB
Solve restricted master problem using primal simplex 6.00k 5.02s 21.9% 837ΞΌs 3.11MiB 0.07% 544B
Add new column to restricted master problem 6.00k 1.36s 5.93% 227ΞΌs 789MiB 17.9% 135KiB
Initialize the restricted master problem (RMP) 1 98.1ms 0.43% 98.1ms 10.6MiB 0.24% 10.6MiB
Generate a random, sparse linear optimization problem 1 73.9ms 0.32% 73.9ms 138MiB 3.13% 138MiB
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
And below is the output of the above code when we use the vectorized function (i.e., use_vectorized_update=true):
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Time Allocations
ββββββββββββββββββββββ βββββββββββββββββββββββ
Tot / % measured: 22.0s / 100% 3.07GiB / 100%
Section ncalls time %tot avg alloc %tot avg
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Extract the optimal dual solution (vectorized) 6.00k 14.9s 67.7% 2.48ms 2.15GiB 70.0% 375KiB
Solve restricted master problem using primal simplex 6.00k 5.43s 24.7% 905ΞΌs 3.11MiB 0.10% 544B
Add new column to restricted master problem 6.00k 1.47s 6.69% 245ΞΌs 789MiB 25.1% 135KiB
Initialize the restricted master problem (RMP) 1 102ms 0.47% 102ms 10.6MiB 0.34% 10.6MiB
Generate a random, sparse linear optimization problem 1 90.6ms 0.41% 90.6ms 138MiB 4.41% 138MiB
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Based on the above numerical results, I come to the conclusion to that extracting the optimal dual solution is the bottleneck due to its large amount of dynamic memory allocation.
Please let me know if the above example makes sense, and if I can provide additional information!
using JuMP
using Gurobi
using TimerOutputs
function main(use_vectorized_update=false)
to = TimerOutput()
@timeit to "Generate a random, sparse linear optimization problem" begin
m = 3000
n = 6000
A = zeros(m, n)
for j=1:n
entries = rand(1:m, 2)
A[entries[1],j] = rand()
A[entries[2],j] = rand()
end
b = ones(m)
c = ones(n)
end
@timeit to "Initialize the restricted master problem (RMP)" begin
dual_solution = zeros(m)
k = 1
model = direct_model(Gurobi.Optimizer())::JuMP.Model
set_silent(model)
@variable(model, 0 <= x[1:k] <= 1)
@constraint(model, Cons[i = 1:m], sum(A[i, j] * x[j] for j in 1:k) <= b[i])
@objective(model, Max, sum(c[j] * x[j] for j in 1:k))
set_optimizer_attribute(model, "Method", 0)
end
return main(use_vectorized_update, to, model, dual_solution, Cons, A, c)
end
function main(use_vectorized_update, to, model, dual_solution, Cons, A, c)
indices = index.(Cons)
m, n = size(A)
for k in 2:n
@timeit to "Solve restricted master problem using primal simplex" begin
optimize!(model)
end
if use_vectorized_update
@timeit to "Extract the optimal dual solution (vectorized)" begin
dual_solution .= MOI.get(backend(model), MOI.ConstraintDual(), indices)
end
else
@timeit to "Extract the optimal dual solution (non-vectorized)" begin
GRBgetdblattrarray(
backend(model),
"Pi",
0,
length(dual_solution),
dual_solution,
)
end
end
@timeit to "Add new column to restricted master problem" begin
new_var = @variable(
model,
base_name = "x",
lower_bound = 0,
upper_bound = 1,
)
set_objective_coefficient(model, new_var, c[k])
for i = 1:m
if A[i, m] != 0
set_normalized_coefficient(Cons[i], new_var, A[i, k])
end
end
end
end
to, model
end
@time to, model = main(true)
@time to, model = main(false)