Querying optimal solution without allocating memory

I am implementing a large-scale decomposition method for solving a huge linear optimization problem. At each iteration of the algorithm, I need to efficiently obtain the optimal primal and dual solution to the LP. The algorithm requires many iterations, and the issue is that considerable memory allocation is taking place each time I query the optimal solution to a JuMP model. Below is a minimal working example to illustrate this issue with memory allocation:

using JuMP
using Gurobi

# Generate a random linear program
m = 1000
n = 500
A = rand(m,n)
b = ones(m)

# Preallocate a vector for storing the optimal solution
primal_solution = zeros(n)

# Build the model
model = direct_model(Gurobi.Optimizer())
set_silent(model)
@variable(model, x[j=1:n] >= 0)
@constraint(model, Cons[i=1:m], dot(A[i,:],x) <= b[i])
@objective(model, Max, sum(x[i] for i=1:n))

# Solve the model
optimize!(model)

# Extract the optimal solution (several times)
@time primal_solution .= value.(x)
@time primal_solution .= value.(x)
@time primal_solution .= value.(x)

println("")

The output of the above code is:

 0.000371 seconds (3.50 k allocations: 62.531 KiB)
 0.000383 seconds (3.50 k allocations: 62.531 KiB)
 0.000386 seconds (3.50 k allocations: 62.531 KiB)

Based on the above output, I conclude that memory allocation is taking place in the heap each time we call the function value.(x).

Is there a way to improve the performance of the above code to remove (or significantly reduce) the need for dynamic memory allocation when querying the optimal primal or dual solutions? For example, is there a way to preallocate the memory to be used by JuMP for computing & returning the optimal solutions? My apologies if this question has already been answered elsewhere!

I haven’t tried, but does this allocate?

for j in 1:n
    primal_solution[j] = value(x[j])
end

Have you confirmed that these allocations are a bottleneck in your solution algorithm?

I haven’t tried, but does this allocate?
for j in 1:n
primal_solution[j] = value(x[j])
end

This does not seem to make a difference in memory allocation. I will demonstrate this numerically below.

Have you confirmed that these allocations are a bottleneck in your solution algorithm?

Yes, these allocations are indeed the bottleneck. To see why these allocations are the bottleneck, I have added below a minimal (but more realistic) working example. The below example is a straightforward implementation of delayed column generation for solving a large linear optimization problem.

using JuMP
using Gurobi
using TimerOutputs

# The function takes a parameter to compare the memory performance
# when using the vectorized vs non-vectorized JuMP function for 
# extracting the optimal solution.

function main(use_vectorized_update=false)

    ###########################################################
    # Create a TimerOutput, which we will use to keep track of 
    # the timing and memory usage of various lines of code
    ###########################################################
    to = TimerOutput()


    ###########################################################
    # Generate a random, sparse linear optimization problem
    ###########################################################

    @timeit to "Generate a random, sparse linear optimization problem" begin
        # Set the number of constraints and variables of the master problem
        m = 3000  
        n = 6000 
    
        # Create a random, sparse constraint matrix
        A = zeros(m,n)
        for j=1:n
            # Randomly select two entries of the j-th column to be nonzero.
            entries = rand(1:m, 2)
    
            # Assign random coefficients to those two entries
            A[entries[1],j] = rand()
            A[entries[2],j] = rand()
        end
    
        # Create the right-hand side vector and cost vector.
        b = ones(m)
        c = ones(n)
    end


    ###########################################################
    # Construct a restricted version of the master problem
    ###########################################################
    
    @timeit to "Initialize the restricted master problem (RMP)" begin
     
        # Preallocate a vector for storing the optimal dual solution
        dual_solution = zeros(m)
       
        # Create a variable to store the number of columns which are
        # in the restricted master problem.
        k=1

        # Build the restricted master problem.
        model = direct_model(Gurobi.Optimizer())
        set_silent(model)
        @variable(model, 0 ≀ x[j=1:k] ≀ 1)
        @constraint(model, Cons[i=1:m], sum(A[i,j]*x[j] for j=1:k) <= b[i])
        @objective(model, Max, sum(c[j]*x[j] for j=1:k))
        
        # Since we are doing column generation, we will tell Gurobi to use 
        # the primal simplex method.
        set_optimizer_attribute(model, "Method", 0)
    end
   
    ###########################################################
    # In each iteration of the algorithm, we solve the restricted
    # master problem (RMP), extract the optimal dual solution to 
    # RMP, use the optimal dual solution to find a column with 
    # negative reduced cost to add to RMP, and add that column
    # to RMP.
    ###########################################################

    while true

        # Terminate if we have added all of the columns
        if k == n
            break
        end

        # Solve the restricted master problem
        @timeit to "Solve restricted master problem using primal simplex" optimize!(model)
    
        # Extract the optimal dual solution to the restricted master problem
        if use_vectorized_update
            @timeit to "Extract the optimal dual solution (vectorized)" dual_solution .= dual.(Cons)
        else
            @timeit to "Extract the optimal dual solution (non-vectorized)" begin
                for i=1:m
                    dual_solution[i] = dual(Cons[i])
                end
            end
        end
    
        # Using the dual solution, we would normally search for a column 
        # (i.e., primal variable) which has negative reduced cost.
        # However, for the sake of simplifying this example, below we will 
        # arbitrarily choose the next (i.e., k+1th) column and add it
        # to the restricted master problem.
        @timeit to "Add new column to restricted master problem" begin

            # Update our counter for the number of columns in the restricted
            # master problem.
            k += 1

            # Add the new column to the restricted master problem.
            new_var = @variable(model,
                                [k], 
                                base_name = "x", 
                                lower_bound = 0, 
                                upper_bound = 1) 

            # Update the objective function of the restricted master problem.
            set_objective_coefficient(model, 
                                      new_var[k], 
                                      c[k])

            # Update the constraints of the restricted master problem
            for i=1:m
                
                # To improve the memory performance, only update the i-th
                # constraint if the coefficient A[i,k] is nonzero.
                if A[i,m] != 0
                    set_normalized_coefficient(Cons[i],
                                               new_var[k], 
                                               A[i,k])
                end
            end
        end
    end
    
    ###########################################################
    # Display the performance of the algorithm.
    ###########################################################
    show(to)
end

Below is the output of the above code when we use the non-vectorized function (i.e., use_vectorized_update=false):

  ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                         Time                   Allocations
                                                                 ──────────────────────   ───────────────────────
                        Tot / % measured:                             23.0s / 100%            4.31GiB / 100%

 Section                                                 ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Extract the optimal dual solution (non-vectorized)       6.00k    16.4s  71.4%  2.73ms   3.39GiB  78.7%   593KiB
 Solve restricted master problem using primal simplex     6.00k    5.02s  21.9%   837ΞΌs   3.11MiB  0.07%     544B
 Add new column to restricted master problem              6.00k    1.36s  5.93%   227ΞΌs    789MiB  17.9%   135KiB
 Initialize the restricted master problem (RMP)               1   98.1ms  0.43%  98.1ms   10.6MiB  0.24%  10.6MiB
 Generate a random, sparse linear optimization problem        1   73.9ms  0.32%  73.9ms    138MiB  3.13%   138MiB
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────

And below is the output of the above code when we use the vectorized function (i.e., use_vectorized_update=true):

 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                         Time                   Allocations
                                                                 ──────────────────────   ───────────────────────
                        Tot / % measured:                             22.0s / 100%            3.07GiB / 100%

 Section                                                 ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Extract the optimal dual solution (vectorized)           6.00k    14.9s  67.7%  2.48ms   2.15GiB  70.0%   375KiB
 Solve restricted master problem using primal simplex     6.00k    5.43s  24.7%   905ΞΌs   3.11MiB  0.10%     544B
 Add new column to restricted master problem              6.00k    1.47s  6.69%   245ΞΌs    789MiB  25.1%   135KiB
 Initialize the restricted master problem (RMP)               1    102ms  0.47%   102ms   10.6MiB  0.34%  10.6MiB
 Generate a random, sparse linear optimization problem        1   90.6ms  0.41%  90.6ms    138MiB  4.41%   138MiB
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Based on the above numerical results, I come to the conclusion to that extracting the optimal dual solution is the bottleneck due to its large amount of dynamic memory allocation.

Please let me know if the above example makes sense, and if I can provide additional information!

There’s a few things going on here:

Type instabilities

Julia can’t infer the type of Cons, which slows the code down. You can improve this using a function barrier.

https://docs.julialang.org/en/v1/manual/performance-tips/#kernel-functions

Scalarized access by JuMP

JuMP queries each element individually, instead of querying the vector.

Overheads in Gurobi.jl

Even the Gurobi.jl code at the MOI level has some overheads:

Solutions

If you’re in direct-mode, you can use:

indices = index.(Cons)
dual_solution .= MOI.get(backend(model), MOI.ConstraintDual(), indices)

This skips some of the JuMP checks

If you’re happy with pure Gurobi, and you know how Gurobi orders the rows in the constraint matrix you can also do

GRBgetdblattrarray(
                    backend(model),
                    "Pi",
                    0,
                    length(dual_solution),
                    dual_solution,
                )

Here’s my solution

using JuMP
using Gurobi
using TimerOutputs


function main(use_vectorized_update=false)
    to = TimerOutput()
    @timeit to "Generate a random, sparse linear optimization problem" begin
        m = 3000
        n = 6000
        A = zeros(m, n)
        for j=1:n
            entries = rand(1:m, 2)
            A[entries[1],j] = rand()
            A[entries[2],j] = rand()
        end
        b = ones(m)
        c = ones(n)
    end
    @timeit to "Initialize the restricted master problem (RMP)" begin
        dual_solution = zeros(m)
        k = 1
        model = direct_model(Gurobi.Optimizer())::JuMP.Model
        set_silent(model)
        @variable(model, 0 <= x[1:k] <= 1)
        @constraint(model, Cons[i = 1:m], sum(A[i, j] * x[j] for j in 1:k) <= b[i])
        @objective(model, Max, sum(c[j] * x[j] for j in 1:k))
        set_optimizer_attribute(model, "Method", 0)
    end
    return main(use_vectorized_update, to, model, dual_solution, Cons, A, c)
end

function main(use_vectorized_update, to, model, dual_solution, Cons, A, c)
    indices = index.(Cons)
    m, n = size(A)
    for k in 2:n
        @timeit to "Solve restricted master problem using primal simplex" begin
            optimize!(model)
        end
        if use_vectorized_update
            @timeit to "Extract the optimal dual solution (vectorized)" begin
                dual_solution .= MOI.get(backend(model), MOI.ConstraintDual(), indices)
            end
        else
            @timeit to "Extract the optimal dual solution (non-vectorized)" begin
                GRBgetdblattrarray(
                    backend(model),
                    "Pi",
                    0,
                    length(dual_solution),
                    dual_solution,
                )
            end
        end
        @timeit to "Add new column to restricted master problem" begin
            new_var = @variable(
                model,
                base_name = "x",
                lower_bound = 0,
                upper_bound = 1,
            )
            set_objective_coefficient(model, new_var, c[k])
            for i = 1:m
                if A[i, m] != 0
                    set_normalized_coefficient(Cons[i], new_var, A[i, k])
                end
            end
        end
    end
    to, model
end

@time to, model = main(true)
@time to, model = main(false)

This drops things down to

────────────────────────────────────────────────────────────────────────────────────────
                                                 Time                   Allocations      
                                         ──────────────────────   ───────────────────────
            Tot / % measured:                 5.26s / 99.3%            148MiB / 100%     

 Section                         ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────────
 Solve restricted master prob...  6.00k    5.05s  96.6%   841ΞΌs   3.30MiB  2.23%     576B
 Add new column to restricted...  6.00k   83.4ms  1.60%  13.9ΞΌs   2.32MiB  1.57%     406B
 Generate a random, sparse li...      1   66.2ms  1.27%  66.2ms    138MiB  93.8%   138MiB
 Extract the optimal dual sol...  6.00k   21.2ms  0.41%  3.54ΞΌs   93.7KiB  0.06%    16.0B
 Initialize the restricted ma...      1   6.84ms  0.13%  6.84ms   3.44MiB  2.33%  3.44MiB
 ────────────────────────────────────────────────────────────────────────────────────────
2 Likes

This is outstanding! My sincerest thanks once again for your help.

1 Like