 # Querying optimal solution without allocating memory

I am implementing a large-scale decomposition method for solving a huge linear optimization problem. At each iteration of the algorithm, I need to efficiently obtain the optimal primal and dual solution to the LP. The algorithm requires many iterations, and the issue is that considerable memory allocation is taking place each time I query the optimal solution to a JuMP model. Below is a minimal working example to illustrate this issue with memory allocation:

``````using JuMP
using Gurobi

# Generate a random linear program
m = 1000
n = 500
A = rand(m,n)
b = ones(m)

# Preallocate a vector for storing the optimal solution
primal_solution = zeros(n)

# Build the model
model = direct_model(Gurobi.Optimizer())
set_silent(model)
@variable(model, x[j=1:n] >= 0)
@constraint(model, Cons[i=1:m], dot(A[i,:],x) <= b[i])
@objective(model, Max, sum(x[i] for i=1:n))

# Solve the model
optimize!(model)

# Extract the optimal solution (several times)
@time primal_solution .= value.(x)
@time primal_solution .= value.(x)
@time primal_solution .= value.(x)

println("")
``````

The output of the above code is:

`````` 0.000371 seconds (3.50 k allocations: 62.531 KiB)
0.000383 seconds (3.50 k allocations: 62.531 KiB)
0.000386 seconds (3.50 k allocations: 62.531 KiB)
``````

Based on the above output, I conclude that memory allocation is taking place in the heap each time we call the function `value.(x)`.

Is there a way to improve the performance of the above code to remove (or significantly reduce) the need for dynamic memory allocation when querying the optimal primal or dual solutions? For example, is there a way to preallocate the memory to be used by JuMP for computing & returning the optimal solutions? My apologies if this question has already been answered elsewhere!

I haven’t tried, but does this allocate?

``````for j in 1:n
primal_solution[j] = value(x[j])
end
``````

Have you confirmed that these allocations are a bottleneck in your solution algorithm?

I haven’t tried, but does this allocate?
for j in 1:n
primal_solution[j] = value(x[j])
end

This does not seem to make a difference in memory allocation. I will demonstrate this numerically below.

Have you confirmed that these allocations are a bottleneck in your solution algorithm?

Yes, these allocations are indeed the bottleneck. To see why these allocations are the bottleneck, I have added below a minimal (but more realistic) working example. The below example is a straightforward implementation of delayed column generation for solving a large linear optimization problem.

``````using JuMP
using Gurobi
using TimerOutputs

# The function takes a parameter to compare the memory performance
# when using the vectorized vs non-vectorized JuMP function for
# extracting the optimal solution.

function main(use_vectorized_update=false)

###########################################################
# Create a TimerOutput, which we will use to keep track of
# the timing and memory usage of various lines of code
###########################################################
to = TimerOutput()

###########################################################
# Generate a random, sparse linear optimization problem
###########################################################

@timeit to "Generate a random, sparse linear optimization problem" begin
# Set the number of constraints and variables of the master problem
m = 3000
n = 6000

# Create a random, sparse constraint matrix
A = zeros(m,n)
for j=1:n
# Randomly select two entries of the j-th column to be nonzero.
entries = rand(1:m, 2)

# Assign random coefficients to those two entries
A[entries,j] = rand()
A[entries,j] = rand()
end

# Create the right-hand side vector and cost vector.
b = ones(m)
c = ones(n)
end

###########################################################
# Construct a restricted version of the master problem
###########################################################

@timeit to "Initialize the restricted master problem (RMP)" begin

# Preallocate a vector for storing the optimal dual solution
dual_solution = zeros(m)

# Create a variable to store the number of columns which are
# in the restricted master problem.
k=1

# Build the restricted master problem.
model = direct_model(Gurobi.Optimizer())
set_silent(model)
@variable(model, 0 ≤ x[j=1:k] ≤ 1)
@constraint(model, Cons[i=1:m], sum(A[i,j]*x[j] for j=1:k) <= b[i])
@objective(model, Max, sum(c[j]*x[j] for j=1:k))

# Since we are doing column generation, we will tell Gurobi to use
# the primal simplex method.
set_optimizer_attribute(model, "Method", 0)
end

###########################################################
# In each iteration of the algorithm, we solve the restricted
# master problem (RMP), extract the optimal dual solution to
# RMP, use the optimal dual solution to find a column with
# negative reduced cost to add to RMP, and add that column
# to RMP.
###########################################################

while true

# Terminate if we have added all of the columns
if k == n
break
end

# Solve the restricted master problem
@timeit to "Solve restricted master problem using primal simplex" optimize!(model)

# Extract the optimal dual solution to the restricted master problem
if use_vectorized_update
@timeit to "Extract the optimal dual solution (vectorized)" dual_solution .= dual.(Cons)
else
@timeit to "Extract the optimal dual solution (non-vectorized)" begin
for i=1:m
dual_solution[i] = dual(Cons[i])
end
end
end

# Using the dual solution, we would normally search for a column
# (i.e., primal variable) which has negative reduced cost.
# However, for the sake of simplifying this example, below we will
# arbitrarily choose the next (i.e., k+1th) column and add it
# to the restricted master problem.
@timeit to "Add new column to restricted master problem" begin

# Update our counter for the number of columns in the restricted
# master problem.
k += 1

# Add the new column to the restricted master problem.
new_var = @variable(model,
[k],
base_name = "x",
lower_bound = 0,
upper_bound = 1)

# Update the objective function of the restricted master problem.
set_objective_coefficient(model,
new_var[k],
c[k])

# Update the constraints of the restricted master problem
for i=1:m

# To improve the memory performance, only update the i-th
# constraint if the coefficient A[i,k] is nonzero.
if A[i,m] != 0
set_normalized_coefficient(Cons[i],
new_var[k],
A[i,k])
end
end
end
end

###########################################################
# Display the performance of the algorithm.
###########################################################
show(to)
end

``````

Below is the output of the above code when we use the non-vectorized function (i.e., `use_vectorized_update=false`):

``````  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Time                   Allocations
──────────────────────   ───────────────────────
Tot / % measured:                             23.0s / 100%            4.31GiB / 100%

Section                                                 ncalls     time   %tot     avg     alloc   %tot      avg
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Extract the optimal dual solution (non-vectorized)       6.00k    16.4s  71.4%  2.73ms   3.39GiB  78.7%   593KiB
Solve restricted master problem using primal simplex     6.00k    5.02s  21.9%   837μs   3.11MiB  0.07%     544B
Add new column to restricted master problem              6.00k    1.36s  5.93%   227μs    789MiB  17.9%   135KiB
Initialize the restricted master problem (RMP)               1   98.1ms  0.43%  98.1ms   10.6MiB  0.24%  10.6MiB
Generate a random, sparse linear optimization problem        1   73.9ms  0.32%  73.9ms    138MiB  3.13%   138MiB
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
``````

And below is the output of the above code when we use the vectorized function (i.e., `use_vectorized_update=true`):

`````` ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Time                   Allocations
──────────────────────   ───────────────────────
Tot / % measured:                             22.0s / 100%            3.07GiB / 100%

Section                                                 ncalls     time   %tot     avg     alloc   %tot      avg
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Extract the optimal dual solution (vectorized)           6.00k    14.9s  67.7%  2.48ms   2.15GiB  70.0%   375KiB
Solve restricted master problem using primal simplex     6.00k    5.43s  24.7%   905μs   3.11MiB  0.10%     544B
Add new column to restricted master problem              6.00k    1.47s  6.69%   245μs    789MiB  25.1%   135KiB
Initialize the restricted master problem (RMP)               1    102ms  0.47%   102ms   10.6MiB  0.34%  10.6MiB
Generate a random, sparse linear optimization problem        1   90.6ms  0.41%  90.6ms    138MiB  4.41%   138MiB
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
``````

Based on the above numerical results, I come to the conclusion to that extracting the optimal dual solution is the bottleneck due to its large amount of dynamic memory allocation.

Please let me know if the above example makes sense, and if I can provide additional information!

There’s a few things going on here:

### Type instabilities

Julia can’t infer the type of `Cons`, which slows the code down. You can improve this using a function barrier.

https://docs.julialang.org/en/v1/manual/performance-tips/#kernel-functions

### Scalarized access by JuMP

JuMP queries each element individually, instead of querying the vector.

Even the Gurobi.jl code at the MOI level has some overheads:

## Solutions

If you’re in direct-mode, you can use:

``````indices = index.(Cons)
dual_solution .= MOI.get(backend(model), MOI.ConstraintDual(), indices)
``````

This skips some of the JuMP checks

If you’re happy with pure Gurobi, and you know how Gurobi orders the rows in the constraint matrix you can also do

``````GRBgetdblattrarray(
backend(model),
"Pi",
0,
length(dual_solution),
dual_solution,
)
``````

Here’s my solution

``````using JuMP
using Gurobi
using TimerOutputs

function main(use_vectorized_update=false)
to = TimerOutput()
@timeit to "Generate a random, sparse linear optimization problem" begin
m = 3000
n = 6000
A = zeros(m, n)
for j=1:n
entries = rand(1:m, 2)
A[entries,j] = rand()
A[entries,j] = rand()
end
b = ones(m)
c = ones(n)
end
@timeit to "Initialize the restricted master problem (RMP)" begin
dual_solution = zeros(m)
k = 1
model = direct_model(Gurobi.Optimizer())::JuMP.Model
set_silent(model)
@variable(model, 0 <= x[1:k] <= 1)
@constraint(model, Cons[i = 1:m], sum(A[i, j] * x[j] for j in 1:k) <= b[i])
@objective(model, Max, sum(c[j] * x[j] for j in 1:k))
set_optimizer_attribute(model, "Method", 0)
end
return main(use_vectorized_update, to, model, dual_solution, Cons, A, c)
end

function main(use_vectorized_update, to, model, dual_solution, Cons, A, c)
indices = index.(Cons)
m, n = size(A)
for k in 2:n
@timeit to "Solve restricted master problem using primal simplex" begin
optimize!(model)
end
if use_vectorized_update
@timeit to "Extract the optimal dual solution (vectorized)" begin
dual_solution .= MOI.get(backend(model), MOI.ConstraintDual(), indices)
end
else
@timeit to "Extract the optimal dual solution (non-vectorized)" begin
GRBgetdblattrarray(
backend(model),
"Pi",
0,
length(dual_solution),
dual_solution,
)
end
end
@timeit to "Add new column to restricted master problem" begin
new_var = @variable(
model,
base_name = "x",
lower_bound = 0,
upper_bound = 1,
)
set_objective_coefficient(model, new_var, c[k])
for i = 1:m
if A[i, m] != 0
set_normalized_coefficient(Cons[i], new_var, A[i, k])
end
end
end
end
to, model
end

@time to, model = main(true)
@time to, model = main(false)
``````

This drops things down to

``````────────────────────────────────────────────────────────────────────────────────────────
Time                   Allocations
──────────────────────   ───────────────────────
Tot / % measured:                 5.26s / 99.3%            148MiB / 100%

Section                         ncalls     time   %tot     avg     alloc   %tot      avg
────────────────────────────────────────────────────────────────────────────────────────
Solve restricted master prob...  6.00k    5.05s  96.6%   841μs   3.30MiB  2.23%     576B
Add new column to restricted...  6.00k   83.4ms  1.60%  13.9μs   2.32MiB  1.57%     406B
Generate a random, sparse li...      1   66.2ms  1.27%  66.2ms    138MiB  93.8%   138MiB
Extract the optimal dual sol...  6.00k   21.2ms  0.41%  3.54μs   93.7KiB  0.06%    16.0B
Initialize the restricted ma...      1   6.84ms  0.13%  6.84ms   3.44MiB  2.33%  3.44MiB
────────────────────────────────────────────────────────────────────────────────────────
``````
2 Likes

This is outstanding! My sincerest thanks once again for your help.

1 Like