Excessive memory usage when adding millions of variables to a JuMP Model

Hi,

I’m solving a large MIP using JuMP + Gurobi. My approach dynamically adds variables to the model (similar to column generation). In some cases, up to 10 000 000 variables are added within a single step. During this step, the memory usage explodes. Julia consumes all available RAM on my desktop, and on a cluster with 16 GB per process I obtain a memory error. However, after this step, the model uses only around 1.2 GB according to Base.summarysize.
The model has roughly 1000 constraints and 50 million nonzeros. Profiling suggest that most of the memory is allocated within set_normalized_coefficient.

Why does adding variables cause such a high memory usage? Is this due to internal JuMP structures? Would building the model directly using the Gurobi C API (or MOI) help reduce memory consumption?
Here is a simplified example of my code:

using Random 
using StatsBase 
using JuMP
using Gurobi 

function obtain_new_columns(num_nodes, num_routes)
    new_columns= Vector{Vector{Int64}}()

    for i=1:num_routes 
        route= sample(1:num_nodes, 20; replace= false)
        push!(new_columns, route)
    end 

    return new_columns 
end 

Random.seed!(1234) 
num_nodes= 100
model = direct_model(Gurobi.Optimizer())
set_string_names_on_creation(model, false)

x= Vector{VariableRef}()
@constraint(model, constr[i=1:num_nodes], 0 == 1)
for num_routes in [10000, 9990000]#while true 
    new_routes= obtain_new_columns(num_nodes, num_routes)
    for route in new_routes 
        push!(x, @variable(model, binary=true))
        
        set_objective_coefficient(model, x[end], 1)
        for i in route 
            set_normalized_coefficient(constr[i], x[end], 1)
        end
    end 
end  
println("Memory usage model:", round(Base.summarysize(model)/1024^3, digits=2), "GB")

Thanks a lot!

1 Like

You should not hope to be able to solve this problem. Even if you could build it, it is likely too large for Gurobi (or any solver) to solve.

In (very very oversimplified) generalities:

  • 10^4 is easy
  • 10^5 is medium
  • 10^6 is hard
  • 10^7 is very hard

Note that summayizesize measures only the code in Julia. It cannot measure allocations in C, which is where Gurobi is.

4 Likes

I’ve been experiencing some large model (≥ 1–10 Million) recently. My Laptop has 16 GB RAM and an Intel 11-th Gen chip.

The difference for large models might be:

  1. The modeling time with JuMP is lengthened.
  2. There is a perceptible time period, after you call optimize!(model) and before you see the first line of Gurobi’s Logging (assume the logging was not silenced).
  3. Gurobi might could still solve the large models (you can monitor its progress via Logging), if the problem itself is numerically non-problematic. But sometimes might see the Warning: Markovwitz shrinked to …
  4. If the model is an MIP in whose solving a lot of branching is entailed, you might end up receiving a termination status MEMORY_LIMIT after some large number of nodes has been explored. (This is to say, if my computer had more RAM, Gurobi might still has the chance to solve it)

@odow Thank you for pointing out this issue! I wasn’t aware of that detail about summarize size.
I understand that problems of this size are difficult to solve. However, for some large models it’s still possible to achieve a limited optimality gap.
@WalterMadelim I also recognize all of your points.

However, I still have the feeling that (some characteristic of) JuMP contributes significantly to the high memory usage. The example below builds the model in two different ways: one using JuMP and one using the C API. The final memory usage of both models is similar. However, the maximum memory usage of the Gurobi environment is twice as high when using JuMP. I also observe this when monitoring the Julia process trough the task manager.

Do you have any idea what might be causing the higher memory usage? Is there a way to reduce the maximum memory usage without building the entire model using the C API?

using StatsBase 
using JuMP
using Gurobi 
using MathOptInterface

function obtain_new_columns(num_nodes, num_routes)
    new_columns= Vector{Vector{Int64}}()

    for i=1:num_routes 
        route= sample(1:num_nodes, 20; replace= false)
        push!(new_columns, route)
    end 

    return new_columns 
end 

function model_jump(routes, num_nodes)
    model = direct_model(Gurobi.Optimizer())
    set_string_names_on_creation(model, false)

    x= Vector{VariableRef}()
    @constraint(model, constr[i=1:num_nodes], 0 <= 1)

    for route in routes 
        push!(x, @variable(model, binary=true))
        
        set_objective_coefficient(model, x[end], 1)
        for i in route 
            set_normalized_coefficient(constr[i], x[end], 1)
        end
    end 
    
    @constraint(model, sum(x) <= 0)
    optimize!(model)

    return MOI.get(model, Gurobi.ModelAttribute("MemUsed")), MOI.get(model, Gurobi.ModelAttribute("MaxMemUsed")) 
end 


function model_api(routes, num_nodes)
    env_p= Ref{Ptr{Cvoid}}()
    error = Gurobi.GRBemptyenv(env_p)
    env= env_p[]
    error= GRBstartenv(env)
    model_p = Ref{Ptr{Cvoid}}()
    error= GRBnewmodel(env, model_p, "test", 0, C_NULL, C_NULL, C_NULL, C_NULL, C_NULL)
    model = model_p[]
    for i=1:num_nodes 
        rhs=1.0
        error= GRBaddconstr(model, 0, Cint[], Cdouble[], GRB_LESS_EQUAL, rhs, C_NULL)    
    end 
    #=
    for (i,route) in enumerate(routes)
        ind= Cint[(route .-1)...]
        val= Cdouble[1 for i=1:length(route)]
        nzc = length(ind)
        error = GRBaddvar(model, nzc, ind, val, 1.0, 0.0, 1.0, GRB_BINARY, C_NULL)
    end
    =#
    for (i,route) in enumerate(routes)
        GRBaddvar(model, 0, Cint[], Cdouble[], 0.0, 0, 1, GRB_BINARY, C_NULL)
        error = GRBsetdblattrelement(model, "Obj", i-1, 1.0)
        for j in route 
            ind= Cint[j-1]
            var= Cint[i-1]
            val= Cdouble[1]
            nzc= length(ind)
            error= GRBchgcoeffs(model, nzc, ind, var, val) 
        end 
    end
    error= GRBaddconstr(model, length(routes), Cint[i for i=0:length(routes)], Cdouble[1.0 for i=1:length(routes)], GRB_LESS_EQUAL, 1.0, C_NULL)     
    
    error = GRBoptimize(model) 

    mem_used= Ref{Cdouble}()
    error= GRBgetdblattr(model, "MemUsed", mem_used)
    mem_used = mem_used[]

    max_mem_used= Ref{Cdouble}()
    error= GRBgetdblattr(model, "MaxMemUsed", max_mem_used)
    max_mem_used = max_mem_used[]

    return mem_used, max_mem_used
end
Random.seed!(1234) 
num_nodes= 100

# compile 
routes= obtain_new_columns(num_nodes, 100)
@time mem_used_jump, max_mem_used_jump= model_jump(routes, num_nodes)
@time mem_used_api, max_mem_used_api= model_api(routes, num_nodes)

# test 
routes= obtain_new_columns(num_nodes, 5000000)
@time mem_used_jump, max_mem_used_jump= model_jump(routes, num_nodes)
println("Jump: Memory used: ", mem_used_jump, " GB, Max memory used: ", max_mem_used_jump, " GB")

@time  mem_used_api, max_mem_used_api= model_api(routes, num_nodes)
println("API: Memory used: ", mem_used_api, " GB, Max memory used: ", max_mem_used_api, " GB")

which gives the output

 66.555772 seconds (540.00 M allocations: 11.572 GiB, 4.37% gc time)
Jump: Memory used: 5.199141979 GB, Max memory used: 9.610150477 GB

 55.942253 seconds (310.00 M allocations: 18.533 GiB, 13.41% gc time)
API: Memory used: 4.963454876 GB, Max memory used: 5.223455684 GB

Do you have any idea what might be causing the higher memory usage? Is there a way to reduce the maximum memory usage without building the entire model using the C API?

Your codes are not equivalent. JuMP sets the constraint coefficients one-by-one, not as a single list. JuMP also sets the the GRB_BINARY and objective coefficients in separate calls.

1 Like

Thanks for pointing this out. I have updated the example such that the operations are now exactly the same. This changed something about the runtime and the number of allocations. However, the difference in maximum memory usage is still there… (also when monitoring the task manager).

1 Like

Can you share the new code? Gurobi is a black box, so it’s a bit hard to say what might be the cause.

I have updated the code in the example, so that is already available. Are there general recommendations to limit the (additional) memory usage when using JuMP+Gurobi (e.g., how I build the model) compared to the C API?

I can’t explain the difference, sorry. This is a question for @torressa (who works at Gurobi)

1 Like

These memory attributes are also limited, as they capture the memory after calling optimize. You’d want to profile the process to compare the memory used by the different APIs. I did this using psrecord, and they look very similar:
JuMP:


C-API:

The difference in the MaxMemory usage from the solving process is from the difference in presolve behaviour:

24,26d27
< *Replay* Change 5000000 variable types
< *Replay* Update Gurobi model
< *Replay* Update Gurobi model
34c38
< Model fingerprint: 0xeef581b8
---
> Model fingerprint: 0xa3af83c3
39c43
<   Bounds range     [0e+00, 0e+00]
---
>   Bounds range     [1e+00, 1e+00]
42,47d45
< Presolve removed 0 rows and 0 columns (presolve time = 8s)...
< Presolve removed 0 rows and 5000000 columns (presolve time = 13s)...
< Presolve removed 101 rows and 5000000 columns (presolve time = 15s)...
< Presolve removed 101 rows and 5000000 columns
< Presolve time: 15.18s
< Presolve: All rows and columns removed
49c47
< Explored 0 nodes (0 simplex iterations) in 24.62 seconds (15.74 work units)
---
> Explored 0 nodes (0 simplex iterations) in 43.75 seconds (3.17 work units)
56,57c54
< *Replay* Free Gurobi model
< *Replay* Free Gurobi environment
---
> *Replay* Reached end of replay file
59a57,58
>
> *Replay* Models leaked: 1
62,63c61,62
< *Replay* Gurobi API routine runtime: 30.73s
< *Replay* Gurobi solve routine runtime: 24.91s
---
> *Replay* Gurobi API routine runtime: 24.13s
> *Replay* Gurobi solve routine runtime: 43.89s

I got this by using a recording file, which also shows the time spent in the API (a bit less in the C-API case) as well ( How do I create and use a recording file?):

The difference in the model is:

0a1
> \ Model test
17766957c17766958
<    + C4999997 + C4999998 + C4999999 <= 0
---
>    + C4999997 + C4999998 + C4999999 <= 1
3 Likes

Thank you for your explanation. I did not now of the psrecond and recording file, which really helped me to solve the issue, thanks!

After fixing the typo in the constraint, JuMP still used way more memory compared to the C API.


The JuMP variant still had additional presolve steps, which caused the higher memory consumption (as you also indicated). The difference between the two models were the variable bounds (as could also be observed from the ‘Bound range’ in the logfiles). Adding a variable with @variable(model, binary = true) results in a lower bound -infinity and upper bound +infinity. However, adding a binary variable with GRBaddvar(model, 0, Cint[], Cdouble[], 0.0, 0, GRB_INFINITY, GRB_BINARY, C_NULL) results in a lower bound of 0 and upper bound of 1.

Changing from @variable(model, binary = true) to @variable(model, binary = true, lower_bound=0, upper_bound=1) results in the same variable bounds for JuMP and the C API, and therefore the presolve step for the JuMP API does no longer occur. Then, there is only limited memory overhead for JuMP (15%).

The only difference is how the model is build, as the recording file has the additional lines:

*Replay* Change 5000000 variable bounds
 
---

*Replay* Change 2500000 variable types

*Replay* Update Gurobi model

*Replay* Update Gurobi model

So apparently, most memory consumption was due to Gurobi’s presolve and not JuMP itself. Adding variables with JuMP is slower, but only introduces limited memory overhead. And using @variable(model, binary = true, lower_bound=0, upper_bound=1) instead of @variable(model, binary = true) result in a model that is similar as when you would directly use the C API.

Thanks for the help!

using StatsBase 
using JuMP
using Gurobi 
using MathOptInterface
using Random 
function obtain_new_columns(num_nodes, num_routes)
    new_columns= Vector{Vector{Int64}}()

    for i=1:num_routes 
        route= sample(1:num_nodes, 20; replace= false)
        push!(new_columns, route)
    end 

    return new_columns 
end 

function model_jump(routes, num_nodes)
    env = Gurobi.Env(Dict{String, Any}("Record" => 1,))
    model = direct_model(Gurobi.Optimizer(env))
    set_string_names_on_creation(model, false)

    x= Vector{VariableRef}()
    @constraint(model, constr[i=1:num_nodes], 0 <= 1)

    for route in routes 
        push!(x, @variable(model, binary=true, lower_bound= 0, upper_bound= 1.0))
        #push!(x, @variable(model, binary=true))
        
        set_objective_coefficient(model, x[end], 1)
        for i in route 
            set_normalized_coefficient(constr[i], x[end], 1)
        end
    end 
    
    @constraint(model, sum(x) <= 0)
    #error = GRBwrite(backend(model), "jump2.lp")
    #error = GRBwrite(backend(model), "jump2.mps")
    optimize!(model)


    return MOI.get(model, Gurobi.ModelAttribute("MemUsed")), MOI.get(model, Gurobi.ModelAttribute("MaxMemUsed")) 
end 


function model_api(routes, num_nodes)
    env_p= Ref{Ptr{Cvoid}}()
    error = Gurobi.GRBemptyenv(env_p)
    env= env_p[]
    error = GRBsetintparam(env, "Record", 1)
    error= GRBstartenv(env)
    model_p = Ref{Ptr{Cvoid}}()
    error= GRBnewmodel(env, model_p, "test", 0, C_NULL, C_NULL, C_NULL, C_NULL, C_NULL)
    model = model_p[]
    for i=1:num_nodes 
        rhs=1.0
        error= GRBaddconstr(model, 0, Cint[], Cdouble[], GRB_LESS_EQUAL, rhs, C_NULL)    
    end 
    #=
    for (i,route) in enumerate(routes)
        ind= Cint[(route .-1)...]
        val= Cdouble[1 for i=1:length(route)]
        nzc = length(ind)
        error = GRBaddvar(model, nzc, ind, val, 1.0, 0.0, 1.0, GRB_BINARY, C_NULL)
    end
    =#
    for (i,route) in enumerate(routes)
        #GRBaddvar(model, 0, Cint[], Cdouble[], 0.0, 0.0, 1.0, GRB_BINARY, C_NULL)
        #GRBaddvar(model, 0, Cint[], Cdouble[], 0.0, -GRB_INFINITY, GRB_INFINITY, GRB_BINARY, C_NULL)
        GRBaddvar(model, 0, Cint[], Cdouble[], 0.0, 0.0, GRB_INFINITY, GRB_BINARY, C_NULL)
        
        error = GRBsetdblattrelement(model, "Obj", i-1, 1.0)
        for j in route 
            ind= Cint[j-1]
            var= Cint[i-1]
            val= Cdouble[1]
            nzc= length(ind)
            error= GRBchgcoeffs(model, nzc, ind, var, val) 
        end 
    end
    error= GRBaddconstr(model, length(routes), Cint[i for i=0:length(routes)], Cdouble[1.0 for i=1:length(routes)], GRB_LESS_EQUAL, 0.0, C_NULL)     
    #error = GRBwrite(model, "api.lp")
    #error = GRBwrite(model, "api.mps")
    
    error = GRBoptimize(model) 

    mem_used= Ref{Cdouble}()
    error= GRBgetdblattr(model, "MemUsed", mem_used)
    mem_used = mem_used[]

    max_mem_used= Ref{Cdouble}()
    error= GRBgetdblattr(model, "MaxMemUsed", max_mem_used)
    max_mem_used = max_mem_used[]

    return mem_used, max_mem_used
end
Random.seed!(1234) 
num_nodes= 100

# compile 
routes= obtain_new_columns(num_nodes, 100)
@time mem_used_jump, max_mem_used_jump= model_jump(routes, num_nodes)
@time mem_used_api, max_mem_used_api= model_api(routes, num_nodes)

# test 
routes= obtain_new_columns(num_nodes, 2500000)
@time mem_used_jump, max_mem_used_jump= model_jump(routes, num_nodes)
println("Jump: Memory used: ", mem_used_jump, " GB, Max memory used: ", max_mem_used_jump, " GB")

@time  mem_used_api, max_mem_used_api= model_api(routes, num_nodes)
println("API: Memory used: ", mem_used_api, " GB, Max memory used: ", max_mem_used_api, " GB")

102.146279 seconds (270.00 M allocations: 5.788 GiB, 2.49% gc time)
Jump: Memory used: 2.603016821 GB, Max memory used: 2.733017629 GB

73.547856 seconds (155.00 M allocations: 9.267 GiB, 3.76% gc time)
API: Memory used: 2.536407267 GB, Max memory used: 2.666408075 GB
1 Like

I think this is a good example of how black-box a solver is. Small changes in the model formulation, like adding bounds to a binary variable, can cause large changes in the internal algorithm flow of the solver, even leading to a doubling of memory usage. I’ll try to remember this post when a similar question inevitably comes up in the future :smile:

1 Like

Gurobi has a nice introductory post on performance variability in MIP solvers: https://support.gurobi.com/hc/en-us/articles/26989876720657-What-is-performance-variability

Seemingly inconspicuous changes like re-ordering constraints / variables or changing the random seed can dramatically affect overall runtime