Slow generation of a large LP model in JuMP

NiclasMattsson · April 3, 2017, 7:49pm

[Wall of text incoming, beware!]

I’m a Julia novice and have been playing around with JuMP over the last month or so. All my small scale test models worked beautifully, so I finally decided to dive into the deep end of the pool and try a full scale model.

So I spent last week translating a global energy system model from GAMS to JuMP. The model has roughly 1500 lines of code and reads input data from three external spreadsheets. It generates a pure LP problem with 7-30 million equations (depending on parameterization) and roughly the same number of variables. But it’s just an LP so CPLEX solves it easily in about 30 seconds on a standard quad-core desktop. That’s the smallest version of the model, larger versions may need up to a few hours.

The JuMP version works, that’s the good news. However, it’s disappointingly slow: GAMS generates the model and sends it to CPLEX in 10 seconds, but my JuMP implementation needs 220 seconds. Almost all this time is spent generating the model constraints. JuMP also eats about twice the memory (but my computer has plenty of RAM so there’s no slowdown due to memory swapping). Interestingly, model generation time doesn’t scale much with model size: a larger version of my model with twice the rows and twice the columns generates in 260 seconds.

I’ve seen a few benchmarks that suggest that JuMP should be roughly as fast as GAMS, so I suppose my implementation is to blame for these performance issues. I hope I can get some pointers on how to speed things up. Here’s a summary of what I’ve done.

Most “sets” of the model (using GAMS terminology) are vectors of symbols. The only exception is the time set, which is a vector of integers. Examples:

primary = [:bio1, :bio2, :hydro, :wind, :solar, :gas1, :gas2, :oil1, :oil2, :coal1, :coal2, :uranium1, :uranium2]
sector = [:elec, :central_heat, :dist_heat, :solid_heat, :non_solid_heat, :feedstock, :transport]
region = [:EUR, :NAM, :PAO, :FSU, :LAM, :CPA, :AFR, :SAS, :PAS, :MEA]
time = collect(2010:10:2150)

Model “parameters” are NamedArrays of Float64, indexed by the sets above. I understand that I am giving up some performance by not using ordinary arrays, but it is important for readability to be able to write demand[:elec,:EUR,t] instead of demand[indexof(sector,:elec), indexof(region,:EUR), t]. The latter gets extremely messy when you have lots of that stuff in the same equation. Also, modeling languages like GAMS and Ampl provide indexing by name, so the comparison is fair.

And here are a few examples of model constraints. Convention: model variables begin with a capital letter and parameters begin with a lower case letter.

@constraint(m, supply_c[i in energy_in, r in region, t in time],
  Supply[i,r,t] - En_export[i,r,t] + En_import[i,r,t] == sum(En_conv[i,o,r,t] for o in energy_out)
)
@constraint(m, capacity_c[i in energy_in, o in energy_out, r in region, t in time;  o != :elec],
  En_conv[i,o,r,t] * effic[i,o,r,t] <= Capacity[i,o,r,t] * cf[i,o] * 8760*3600/1e6
)
@constraint(m, market_growth_c[i in energy_in, o in energy_out, r in region, t in time;  t > t0],
  Cap_invest[i,o,r,t] <= Cap_invest[i,o,r,t-t_step] * (1+maxgrowth)^t_step + seed_capacity
)
@constraint(m, cost_capacity_c[r in region, t in time],
  Cost_capacity[r,t] == sum(Cap_invest[i,o,r,t] * invest_cost[i,o,t] for i in energy_in, o in energy_out)
)

Incidentally, I would much prefer to use the @constraints m begin ... end syntax since the list of constraints would become much cleaner without the repeated @constraint macros and additional parentheses. However, when debugging the model, all errors were reported with the line number of the @constraints statement. I had to switch to the uglier syntax to get the correct line number of the error.

I initially had the entire model in the global scope, but split up into a number of smaller files using include() statements. I gained a bit of speed when I put everything in the same file and wrapped it all in a function. However, that means I now have an unmanageable file with 1500+ lines. Is there a good way to split it up again but still have everything in the same non-global scope? Using include() doesn’t work inside a function.

Thanks in advance for any help!

joaquimg · April 4, 2017, 1:03pm

Have you encapsulated all you code inside a function or you have been running in a global scope?
(running code in global scope with global variables tends to be much slower)
Have you measured time on a second run?
(The first time you run you code it is compiled)

NiclasMattsson · April 4, 2017, 2:03pm

See the last paragraph of my post.

Yes, the speeds I report are second runs.

miles.lubin · April 4, 2017, 2:49pm

I’d recommend using Julia’s profiling utilities to figure out where most of the time is being spent, plus the standard best practices of making sure that the code is structured in a way that allows Julia’s type inference to perform as it was designed to do (e.g., with @code_warntype).

Another important point to be aware of is that JuMP’s macros translate statements directly into for loops.

For example,

 @constraint(m, market_growth_c[i in energy_in, o in energy_out, r in region, t in time;  t > t0],
  Cap_invest[i,o,r,t] <= Cap_invest[i,o,r,t-t_step] * (1+maxgrowth)^t_step + seed_capacity
)

would become something like

for i in energy_in
    for o in energy_out
        for r in region
            for t in time
                if t > t0
                    market_growth_c[i,o,r,t] = @constraint(m, Cap_invest[i,o,r,t] <= Cap_invest[i,o,r,t-t_step] * (1+maxgrowth)^t_step + seed_capacity)
                end
            end
        end
    end
end

If the statement t > t0 is false most of the time, you’ll be doing a lot of unnecessary work. When using JuMP, it’s your job to write your model in a way avoids these sorts of situations, just as you would when writing normal code.

joaquimg · April 4, 2017, 4:20pm

In the limit you can write a function for each constraint (and variable) if you want, and then each function can go into a separate file. All you have to do is pass the model m and the parameters relevant to that constraint. You will have to put variable in the function scope either by using getvariable os simply by passing it to the function.

leethargo · April 4, 2017, 7:45pm

@miles.lubin: Do you give a guarantee about the order of the nested for loops? I guess you are suggesting to put the for t in time as the outermost loop. Could this be achieved reliably by putting it first after the contraint name?

miles.lubin · April 4, 2017, 7:54pm

Yes, they will match the order that you wrote them in. However, the conditions specified by using the [x in X; foo(x)] syntax will always be checked at the lowest level, so just rearranging the indices won’t address the issue above. I’m not a big fan of JuMP’s [x in X; foo(x)] syntax anymore. Julia’s generator syntax like sum(F(x) for x in X if foo(x) for y in Y if foo(x,y)) allows you direct control of how to nest the conditions with the iteration sets.

bbrunaud · April 15, 2018, 5:45am

GAMS has something neat that JuMP doesn’t have. It doesn’t create variables that are not actually used in any constraint. That way the model sent to the solver is much smaller, and that explains the difference in performance.

The solution in JuMP (or at least my approach to the problem) is to create dictionaries of arrays. For example, in your case the set energy_in and the set energy_out. Probably, not all the (i,o) combinations are allowed. So you have to a little previous work. Define a dictionary energy_out that given an input of i it gives you an array of allowed elements of energy out

Example

origins = [:A :B :C]
destinations = Dict(
:A => [1,2],
:B => [2],
:C => [2,3]
)

@variable(m, x[i in origins, j in destinations[i])

julia> x.tupledict
Dict{Tuple{Any,Any},JuMP.Variable} with 5 entries:
  (:C, 2) => x[C,2]
  (:B, 2) => x[B,2]
  (:A, 2) => x[A,2]
  (:A, 1) => x[A,1]
  (:C, 3) => x[C,3]

Now you have a sparse variable indexed with symbols. You only need to have the precaution to never call x[:B,1] for example, because it’s not defined. So you have to drag the same definition to your constraints, or put if clauses in your summations, same as you would do in GAMS with your $ signs.

regards!

jayce_ram · April 16, 2018, 1:38pm

thanks a lot @bbrunaud! Do you know if it’s possible to specify the combinations by using the informations contained in a spreadsheet for example?

ExpandingMan · April 16, 2018, 2:04pm

Note that, as I’ve pointed out elsewhere, you can also use sparse arrays. I personally do not find Dicts useful as they are not mathematical objects with well-defined properties; sparse arrays are.

You can do absolutely anything you want. The great thing about JuMP (and Julia) is how generic it is, you can use absolutely any type of array or Julia object you want. If you want you can even combine the use of sparse arrays and Dicts. I would suggest reading up on sparse arrays and looking for the appropriate IO package for whatever data format you are reading in. Since you mention “spreadsheet”, I imagine that CSV.jl might be a good start.

bbrunaud · April 17, 2018, 4:10am

Ok, say you have the same example above.

origins = [:A, :B, :C]

Now you want to construct the destinations dict from a file where you have the allowed combinations, ij.csv

using CSV

d = readtable("ij.csv")

destinations = Dict(i => [] for i in origins)
for i in 1:size(d,1)
  push!(destinations[d[i,1]], d[i,2])
end

If you want it to be Excel instead of CSV, check ExcelReaders, or this code I wrote (experimental)

jdlara-berkeley · April 24, 2018, 10:24pm

What would you use to test the performance of programmatic constraint/variable creation? I have used the @benchmark macro but I get a ton of warnings about the variable being created already so I don’t think it actually represents the performance of the different ways variables and constraints can be generated.