Speed-up JuMP model storing objects as Indexed tables

Hello,

I’m new to Jula/JuMP and working on a very large optimization model, which generation I aim to fully automatize based on sets, relations among sets and parameters provided externally. I developed a basic procedure for this purpose that mainly builds on storing variables, parameters and constraints within Indexed Tables to enable fast grouping and filtering. Since changing these things will be very time-consuming once I have coded the actual model, I hoped to get some feedback on this beforehand.

In the example, demand for a certain energy carrier has to be satisfied at any point of time by production from a set of technologies.

using JuMP
using JuliaDB
using GLPK
model = Model()

# Creation of parameters and mappings among sets
Mapping_dic = Dict{Symbol, Array{Union{Int,Array{Int,1}},1}}()
# sets of of timesteps and energy carriers
Mapping_dic[:Time]  = Array(1:4200)
Mapping_dic[:Carrier] 	= Array(1:2)
# sets of technologies mapped to carriers, techs in the first subarray can produce the first carrier and so on
Mapping_dic[:Tech]	= [[1,2,3,4],[2,3,5,6,7]] 
# parameter with demand to be satisfied
Parameter_tab = table(repeat(Array(1:4100),2),vcat(fill(1,4100),fill(2,4100)),fill(3.0,8200),names=[:Time,:Carrier,:Val], pkey = [:Time, :Carrier])


# Creation of production variables based on Mapping_dic
Variable_info = VariableInfo(true, 0, false, NaN, false, NaN, false, NaN, false, false)
Variables_tab = table(Int16[], Int16[], Int16[], VariableRef[], names=[:Time, :Carrier, :Tech, :Ref], pkey = [:Time, :Carrier])

function generate_variables(model,Mapping_dic,Variables_tab,Variable_info)
	for time::Int16 = Mapping_dic[:Time], carrier::Int16 = Mapping_dic[:Carrier], tech::Int16 = Mapping_dic[:Tech][carrier]
		push!(rows(Variables_tab), (Time = time, Carrier = carrier, Tech = tech, Ref = JuMP.add_variable(model, JuMP.build_variable(error, Variable_info))))
	end
	return Variables_tab
end

Variables_tab = generate_variables(model,Mapping_dic,Variables_tab,Variable_info)


# Creation of constraints based on variabled generated earlier and parameter provided externally
Constraints_tab = table(Int16[], Int16[], ScalarConstraint[], names=[:Time, :Carrier, :Ref], pkey = [:Time, :Carrier])
ConstraintsInfo_tab = join(JuliaDB.groupby(unique, Variables_tab, (:Time, :Carrier), select=:Ref), Parameter_tab, how=:inner)

function  generate_constraints(Constraints_tab,ConstraintsFil_tab)
	for i::NamedTuple{(:Time, :Carrier, :unique, :Val),Tuple{Int64,Int64,Array{VariableRef,1},Float64}} in rows(ConstraintsFil_tab)
		push!(rows(Constraints_tab), (Time = i[1], Carrier = i[2], Ref = @build_constraint(sum(i[3]) == i[4])))
	end
	return Constraints_tab
end

Constraints_tab = generate_constraints(Constraints_tab,ConstraintsInfo_tab)


# Adding all contraints to the actual model, add a dummy objective function and solve
for i in select(Constraints_tab,:Ref) JuMP.add_constraint(model,i) end
@objective(model, Min, Variables_tab[1][:Ref])
JuMP.optimize!(model,with_optimizer(GLPK.Optimizer))

The first part of the code, creation of mappings and parameters, is hardcoded for testing purposes and would be computed from input data in the actual model. As you can see, the nested loop within the generate_variables function builds on the mapping earlier and avoids generating unnecessary variables for non-existing technology/carrier combinations. Afterwards, grouping the created table of variables and joining with the parameter table allows the automatic generation of all relevant constraints.

I hope the example is clear. I thought about preallocating the Indexed tables instead of pushing to increase performance, but could not find any corresponding methods.

I’m not sure I understand what your actual question or request is.
From what I understand, you fear that model generation will take a long time for your large dataset if the “usual” data structures Array or Dict-based are used. Therefore, you want to try IndexedTable instead, for faster indexing and grouping.

I had planned on doing some benchmarks with this as well, but did not get around it, unfortunately.
When variables are created with JuMP using the macro syntax, the user has some control over the data structure that is used (see docs).

As for push!ing data into IndexedTables, I found that it does some buffering internally. For example, see flush! where this can be forced.

If you do benchmarking with different containers, please do share the results :pleading_face:

1 Like

My question is, if there are any suggestions to further speed-up the code I posted. For instance, I’m still getting a lot of type instabilites when applying @code_warntype to my functions, but being quite new to Julia I have no idea, if and how they could be resolved. Thanks for the tip on flush!. I will look into that.

The use of IndexedTable as a container is something I’m very confident in. Timing the generation of 26 million variables and 4.2 million contraints as done in the example above, these are the results for generate_variables and generate_constraints respectively:

28.369719 seconds (181.34 M allocations: 6.793 GiB, 26.20% gc time)
98.574473 seconds (182.60 M allocations: 10.570 GiB, 74.65% gc time)

I can not provide you with a structured numerical benchmark, but I’ve tested JuMPs dense and sparse axis containers, SparseArrays and DataFrames and nothing came near the performance I’m achieving now.

JuMPs dense axis was not suited, because the variables I aim to create are very sparse (see the example, where each technology can only provide a specific energy carrier). JuMPs sparse axis container is actually a dictionary using Tuples as a key. As a result, filtering and grouping variables for specific sets is very time-consuming. SparseArrays hold the disadvantage of being limited to two dimensions. Lastly, DataFrames were the next best option, but are still outperformed by IndexedTables in my case.

2 Likes

Thanks for clearing that up.

It’s been a while since I used IndexedTables, so your code is not immediatly obvious for me.
On Julia performance gotcha is to benchmark code that is not inside a function (which is the compilation boundary).

So, maybe try to put the code where Mapping_dic etc. are defined inside a function as well and run that. Not sure if that could be the source of type warnings.

JuMP is meant to be flexible enough so that you can use your own data structures for storing the variables and constraints if the defaults aren’t a good fit, so while this code doesn’t look like a typical JuMP model, JuMP is working as designed. I’m not familiar enough with IndexedTables to comment on your use of it, however. In terms of the JuMP usage,

@variable(model, lower_bound = 0) is equivalent to the above expression. No need to reference internal-ish objects like build_variable and VariableInfo. This is more kind to readers of the code.

What is the motivation for constructing constraints separately from adding them to the model (e.g., calling @constraint)?

Using the internal objects instead of the macro for variable generation takes slightly longer, which is why I preferred it. There is no real reason to separate construction of constraints from adding them. Thanks for pointing that out.