Parallel pre-calculation of optimization model

I have got a rather large optimization problem, where creating the model (and writing an .lp file or passing it to a solver) takes a substantial amount of time (a few days), while solving it is (comparatively) quick. Since it is not advised to parallelize the actual creation of the the JuMP model I thought about two approaches:

  1. After creating the model, parallelize the .lp file writing (by writing parts to different files and only merging afterwards)

  2. Parallelizing the pre-processing to speed up the actual model creation.

(2) is the (in theory) more easy approach, since a considerable amount of model creation is lost not on the actual usage of @variable (or expr / constr) macros, but on deciding which variables to construct, what there bounds are, the exact formulation of constraints and so on. This could easily be done in parallel and before actually trying to create the JuMP model, but relies on one question that I am unsure on how to (efficiently!) solve in Julia:

Consider the simple model structure

if condition
    constraint_1 = "x + y <= 7"
else
    constraint_1 = "x + 2*y <= 7"
end

(consider the string not as actual string but as some efficient way to describe a symbolic constraint)

I can check all these conditions for each constraint easily in a parallel fashion beforehand and “remember” that the correct constraint to construct for index 1 is “x + 2y <= 7”. How would I go about representing that and using it to generate

@constraint(m, x + 2y <= 7)

afterwards?

(Additional question: is there any resource on approach (1), the parallelization of .lp file writing?)

1 Like

Welcome to the community!

Is it possible to you to provide a small example of model building that captures the steps needed?

What takes so much time exactly?

The time is spent in (a) deciding which constraints to construct (b) building the model and (c) writing it to a file.

Assuming that (b) only consists of calls to JuMP macros and (c) is not easily modifiable, the major speedup I’m looking for is in (a).

It boils down to what I’ve condensed in my post: based on a condition a constraint is constructed in a specific way. I have (for a simplified model) around 1e9 constraints. They are constructed based on inputs from various configuration files.

Assuming that the way of storing/accessing these needed input data is as fast as possible (I’m converting some code from C++ here, so I’m not fully set on the correct representation - it’s a collection of mostly sparse and a few dense 2d matrices) the bottleneck is checking each set of inputs for “condition” and deciding how the constraint should look like. This is extremely inefficient (no matter the data access) when done in a non parallel way, since I have access to at least 256 cores.

The problem is how to represent the final constraints in a way that I can (after deciding in separate processes) efficiently just pass to jump to create them.

I can of course create the full matrix of coefficients and try to work directly with MOI, but I’m not sure that’s a good idea.

(I can of course come up with a simple in-code example tomorrow, I’m just unfortunately not at my PC anymore)

1 Like

A few days??!? Have you profiled where the time is being spent? Can you provide example code?

What solver are you using? Why write an LP file?

Read Design patterns for larger models · JuMP

Is it possible to you to provide a small example of model building that captures the steps needed?

if condition1
    @constraint(model, -coeff <= x <= coeff)
elseif condition2
    @constraint(model, -coeff <= x + y <= coeff)
elseif condition3
    @constraint(model, -coeff - z <= x + y <= coeff + z)
end

As far as I understand, no matter how I implement that I’m always stuck with potentially branching during model creation (where I can not parallelize) versus in “pre-processing” where I can. Since there are many conditions that need to be checked (based on external parametrization) for a lot of variables and constraints, I am looking for a way to reduce the time spent in a singular process to a minimum.

If possible this would lead to:

# this was earlier decided by any of X workers
symbolic_expr1 = "-coeff <= x + y <= coeff"

# this is now used on the "master process"
@constraint(model, symbolic_expr1)

And that’s where my question comes in, if there is any way to solve this “symbolic expression”-creation-problem that is actually faster than branching a whole lot (some kind of expression that can directly be used by JuMP without a need to “parse” it, otherwise I would lose much more than a few simple branches cost).

A few days??!?

I know that’s not a “usual” time but that is just due to problem size (with simple versions of the model at around 3e8 variables and 1.4e9 constraints, and “real” versions becoming way bigger).

What solver are you using?

That depends. None here, since solving it is not the concern. Usually Gurobi, sometimes CPLEX.

Why write an LP file?

Two reasons: (1) Re-solvability. It can sometimes be necessary to re-solve a model a few weeks later. This is currently done by storing a separate symbol-map, that helps with modifying a few coefficients directly in the .lp file and the resolving. There are other approaches but that’s the one we currently use (but this is not really important). (2) easier way to benchmark the actual model creation; measuring “time-to-lp” allows for a comparison to two other implementations we have (yes I am aware that “time-to-lp” is a foolproof way to compare it, but it’s the one I need to report timings on for now).

Have you profiled where the time is being spent? Can you provide example code?

Regarding code, see above. As to a full profiling + real code, I can provide that (sorry for not already including it), but I need to wait for physical access to the machine its on to boil it down to a minimum working example. I’ll update with that asap.

A few things:

  • This is the first time I’ve heard models with 1e9 constraints described as “simple”. Note that the current interfaces of Gurobi.jl and CPLEX.jl use Int32 for their indices, so you’re limited to 2.1e9 variables and constraints.
  • What is the problem? Do you really need this many variables and constraints? Do they have a meaningful impact on the final solution? You should consider ways to approximate parts of the model to make it more tractable.
  • Why are you using JuMP for this? It might not be the best tool for the job. Problems on this scale might be better solved using a solver’s specific interface, e.g., in C or C++. You might even be better off writing custom code to write an LP file in parallel and not using JuMP or any solver code.
  • How are you writing the LP file? model = Model(); write_to_file(model, "model.lp")? Note that this will result in two copies of the model being created: one in JuMP and the other in the LP writer. Are you running out of memory?
  • Profile where the time is being spent. Even for a model with a billion constraints, I wouldn’t expect JuMP to take days to generate it. That suggests something is wrong that could be improved. Is it in the @variable and @constraint calls? Computing the if-else statements? Or in write_to_file?
  • Why LP files? You might be better off using MPS, which is a little more optimized for larger models.

This is the first time I’ve heard models with 1e9 constraints described as “simple”.

It’s maybe large, but consists of around 10 different types of constraints that are just repeated for a whole lot of entitites & timesteps. So yes, large, but extremely simple and very efficiently solvable (that’s why generating the problem is the bottleneck compared to solving it).

What is the problem? Do you really need this many variables and constraints? Do they have a meaningful impact on the final solution?

It’s a mobility/transport “flow” problem. Yes, kind of - most of the time no, but it scales with number of entities and temporal resolution (and the goal is to use this approach to achieve a much higher resolution than we previously could achieve). Unfortunately they do, since it’s the base of the problem that an optimal decision for each entity must be determined.

Why are you using JuMP for this?

We do have a custom C++ implementation and a benchmark (modified & compiled) Pyomo prototype. It’s just a problem of “maintaining the C++ and making it usable for non-tech-savy people is too time-consuming”.

How are you writing the LP file? model = Model(); write_to_file(model, "model.lp") ?

Yea, that was my naive approach. Is there a better way to write the file? (I know using direct mode with constraints formulated in the way the specific solver expects them should be way better in the end, it’s currently only about testing performance)

Profile where the time is being spent

It’s actually a large chunk in garbage collection (something around 30-50% gc time gets reported). Disabling GC is a huge speedup but I’m not sure it would be feasible on the full scale problem. One “entity” (roughly consisting of 10k variables and 20k constraints) takes around 15MiB without. Since I need somewhere between 30-60k entities and have 1TB RAM (for the model and loading the parameters/configuration), I’m unsure that I can build the whole model without enabling GC. (or can I force it to only run once every X minutes?)

Why LP files?

Only because the custom C++ implementation can only produce LPs, so I’m stuck with that for comparison purposes.

Pyomo prototype

How long does Pyomo take?

Profile where the time is being spent

How long does it take to build the model?
How much of that is spent computing indices etc?
How long does write_to_file take?
Are you following the Julia performance tips?
Do you have global variables?
Are you avoiding unnecessary computations?

Is there a better way to write the file?

write_to_file is inefficient because:

  • JuMP builds a model
  • JuMP copies that model into a MOI.FileFormats.LP.Model, potentially reformulating your model into constraints that the LP writer supports
  • The LP writer writes to file

This results in two copies of your model in-memory. In addition, the LP writer has not been well optimized for files this large.

I know using direct mode with constraints formulated in the way the specific solver expects them should be way better in the end, it’s currently only about testing performance

Better performance means not duplicating the model twice. Try (being aware of the Int32 limitation):

using JuMP, Gurobi
model = direct_model(Gurobi.Optimizer())
# ... build model ...
GRBwritemodel(backend(model), "model.lp")

This will result in only one copy of the model, and it will use Gurobi’s optimized LP writer.

(How big is the resulting LP file???)

2 Likes

Sorry if this is only noise (possibly it is). But maybe it helps to understand why we need further details to understand what is going on. This is the time necessary to build and solve a simple problem with ~10^8 variables and bounds (not constraints). I am aware this is not representative of anything that might be related to the OP problem, but maybe it clarifies what we need to understand to help:

julia> using SPGBox

julia> function solve(n)
           x = 500*rand(n); lower=zeros(n); upper = 100*ones(n)
           spgbox!(sum, (g,x) -> g .= 1, x, lower=lower, upper=upper)
       end
solve (generic function with 1 method)

julia> @time solve(10^8); 
  6.965002 seconds (174.65 k allocations: 5.969 GiB, 4.69% gc time, 3.57% compilation time)

julia> @time solve(2*10^8); 
 13.156738 seconds (21 allocations: 11.921 GiB, 4.40% gc time)

I cannot run anything larger. But nothing there can imaginably take “days”, so that is really outside of the scope of normal optimization problems I know.

1 Like