Is it possible to incorporate the relational algebra technique into JuMP in the future to expedite model generation?

Recently, I came across a post titled “Performance in Optimization Models: A Comparative Analysis of GAMS, Pyomo, GurobiPy, and JuMP” (source: GAMS Blog) where it was observed that the commercial software GAMS demonstrated superior performance compared to JuMP and other modeling languages in terms of model generation time. The technique utilized in GAMS, known as relational algebra, played a significant role in achieving these results. As a Ph.D. student specializing in process system engineering and utilizing both GAMS and JuMP, I am curious to know if there are plans to incorporate the relational algebra technique into JuMP in the future or if it is deemed necessary to do so.

1 Like

The example mentioned uses a data structure that is poorly suited to Julia (vectors of vectors of strings). The slow “JuMP” line uses an O(N^2) algorithm for constructing the constraint, the faster version uses O(N), but if you used a different data structure, you could likely improve things again.

There’s a different example that they ran (but didn’t talk about) which shows JuMP is much more competitive with GAMS:

I guess the performance depends on your specific model. I wouldn’t worry about the differences unless you find that JuMP is too slow your for needs.

I am curious to know if there are plans to incorporate the relational algebra technique into JuMP in the future or if it is deemed necessary to do so.

There are no plans to do so

3 Likes

Ok, Thx!

I tried their benchmark, with n = 2800, 92% of the time is spent on this line

    x_list = [
        (i, j, k, l, m)
        for (i, j, k) in IJK
        for (jj, kk, l) in JKL if jj == j && kk == k
        for (kkk, ll, m) in KLM if kkk == k && ll == l
    ]

which is not even using JuMP. Let n, m, l size of the arrays IJK, JKL and KLM.
The complexity of this algorithm is O(n * m * l). It actually possible to do reduce the complexity considerably by doing.
In complexity O(n + m), you can merge the first two with something like

jk_i = Dict{Tuple{String,String},Vector{String}}()
for (i, j, k) in IJK
    if !haskey(jk_i, (j, k))
        jk_i[(j, k)] = String[]
    end
    push!(jk_i(j, k)], i)
end
IJKL = NTuple{4, Int}[]
for (j, k, l) in jkl
    for i in jk_i[(j, k)]
        push!(IJKL, (i, j, k, l))
    end
end

Suppose the length of the resulting array IJKL is o, the complexity of this algorithm is O(n + m + o). You can then merge IJKL with KLM with a complexity of O(o + l + p) where p is the length of the final array in a similar fashion.
So the final complexity is O(m + n + l + o + p) which is much smaller than O(m * n * l) !
So if I’m not mistaken, the benchmark is showing that the algorithm of complexity O(m * n * l) is inefficient (I assume the Python and GAMS version use a better algorithm), this has nothing to do with JuMP.

4 Likes

I guess one point is that the code you wrote is more complicated than the original, which was the underlying theme of the blogpost.

1 Like

Sure, but they introduce the fast_JuMP code with “With additional research and effort, it is possible to find alternative implementations that outperform the intuitive approach”.
So for the “slow JuMP” I understand that they just write the naive approach but with “additional research and effort”, you don’t expect to have more than 90% of the time still spent on a line not even using JuMP.
Once you figure out that this line is the bottleneck, it’s not premature optimization to make it a bit more complicated.
By the way, since this is essentially what is call an inner join of three tables, you can also get the linear complexity using DataFrames.jl and Query.jl as well:

using DataFrames
ijk = DataFrame(i = getindex.(IJK, 1), j = getindex.(IJK, 2), k = getindex.(IJK, 3))
jkl = DataFrame(j = getindex.(JKL, 1), k = getindex.(JKL, 2), l = getindex.(JKL, 3))
klm = DataFrame(k = getindex.(KLM, 1), l = getindex.(KLM, 2), m = getindex.(KLM, 3))
using Query
@from a in ijk begin
    @join b in jkl on (a.j, a.k) equals (b.j, b.k)
    @join c in klm on (b.k, b.l) equals (c.k, c.l)
    @select {a.i,a.j,a.k,b.l,c.m}
    @collect DataFrame
end
5 Likes