Working efficiently with sparse variables in JuMP

After using JuMP mostly for smaller, nonlinear problems, I did some testing for larger LP/MILP models, and noticed that one can easily get a performance hit in problem creation when there are many sparse variables (a common use case for us at work), see demonstration of the issue below.

While this is fairly straightforward to work around for constraints, I found the creation of sparse variables a little cumbersome, and started playing around with some helper functions to ease creation of sparse variables or work with variables defined over tuples.

I would love to get advice on the best/most elegant way to do this, or alternatively suggestions for improvements on the ones I started working on.

To elaborate on the performance issue, consider the following example. Maybe not the best terminology, but “naive” is creating variables over the full domain with conditional, while “sparse” is creating only the needed variables efficiently:

using BenchmarkTools
using JuMP
using SparseHelper

function test_naive(N=10)
    m = Model()
    ts = [(1,1,1),(N,N,N)]
    valid = Dict()
    for t in ts
        valid[t] = true
    end
    @variable(m,x[i=1:N,j=1:N,k=1:N ; haskey(valid,(i,j,k))])
    return m
end

function test_sparse(N=10)
    m = Model()
    ts = [(1,1,1),(N,N,N)]
    I,J,K = sparsehelper(ts,3)
    @variable(m,x[i=I,j=J[i],k=K[i,j]])
    return m
end

function test_both()
    Ns = range(10,step=10,stop=200)
    naive = []
    sparse = []
    for n in Ns
        ntime = @belapsed test_naive($n)
        push!(naive,ntime)
        stime = @belapsed test_sparse($n)
        push!(sparse,stime)
    end
    return Ns,naive,sparse
end

N,naive,sparse = test_both()

using Plots
plot(N,naive,label="Naive") 
plot!(N,sparse,label="Sparse")

sparse

2 Likes

We’re aware of the issue, and your work-around is our suggested one. If you search Discourse you can probably find similar questions.

@variable(model, x[i=1:N, j=1:N; cond(i, j)])

effectively expands to

for i = 1:N
    for j = 1:N
        if cond(i, j)
            add_variable(model)
        end
    end
end

The conditional is at the bottom-most loop, which is why you’re getting polynomial behavior.

If you have time, the documentation could be expanded to warn about potential performance implications, along with the suggested work-around (pre-process the conditional to create lists of tuples): https://www.juliaopt.org/JuMP.jl/v0.20.0/variables/#variable_sparseaxisarrays-1

1 Like

Thanks, I agree this should be documented, people might get the wrong impression of the capabilities of JuMP otherwise. I’ll give the documentation a shot when I get the time.

Are there any plans to add more convenient syntax for this, e.g. iteratively creating variables? The need to work with tuples or construct the necessary dictionaries might put newcomers off, even if it’s not really that difficult. Getting the dictionaries for the sparse construction right was a bit tricky for more than two indices.

To clarify, our suggested work-around is the following:

using JuMP
model = Model()
A = 1:3
B = 1:3
C = 1:5
cond(a, b, c) = a + b + c == 5
S = [(a, b, c) for a in A, b in B, c in C if cond(a, b, c)]
@variable(model, x[S])
x[(1, 1, 3)]

Note that you need the additional parentheses.

In the long-term, we should investigate ways to improve the performance.

Right, sorry if my last post was confusing. Suggestion for adding to docs in this PR.

The second point is more of a wish-list item. Using a list of tuples certainly works, but isn’t as “nice” with the extra tuple/parantheses. And as demonstrated in the OP it is already possible to create only the needed variables, so a more convenient syntax would be nice, I think.

The reason I suggest adding variables iteratively, is that in my use case, I will have to add all constraints iteratively anyway, and it is possible to add bounds iteratively, the only missing part is creating the variables. This would be a natural approach coming from Mosel, where you can first declare the variable space, and then create only the variables you will actually need. The “;”-syntax probably feels natural if you are used to GAMS’ “$”-notation, but for variable creation in this use case it is a bit awkward, as there isn’t really a condition to be evaluated, it is typically just that you use the indices for which you have data (read from csv or a database).

As I commented in the PR, another alternative is to create your own dictionary:

x = Dict()
for it in complex_list()
   if complex_condition(it)
     i, j, v = complex_indices(it)
     x[i, j, v] = @variable(model)
     # Optional: if you care about pretty printing.
     set_name(x[i, j, v], "x[$i, $j, $v]")
   end
end

This seems analogous to “first declare the variable space, and then create only the variables you will actually need” and doesn’t require any extra syntax from JuMP.

1 Like

Thanks, that’s very close to what I’m looking for. However it doesn’t seem to give feature parity with the technique in the OP, as the variables are not registered with the model. Is there a way around that?

model=Model()
N = 10
S = [(1,1,1),(N,N,N)]
# slow as it evaluates conditional N^3 times:
@variable(model, x1[i=1:N,j=1:N,k=1:N; (i,j,k) in S]) 
# fast (linear in dimension of S):
@variable(model, x2[S])
# avoid using tuples by constructing dictionary (also fast):
x3 = Dict()
for (i,j,k) in S
    x3[i,j,k]  = @variable(model)
    # Optional, if you care about pretty printing:
    set_name(x3[i,j,k], "x[$i,$j,$k]")
end
show(model)

A JuMP Model Feasibility problem with: Variables: 6 Model mode: AUTOMATIC CachingOptimizer state: NO_OPTIMIZER Solver name: No optimizer attached. Names registered in the model: x1, x2

Yes, based on the example here you can do as follows:

x3 = model[:x3] = Dict()
for (i,j,k) in S
    x3[i,j,k]  = @variable(model)
    # Optional, if you care about pretty printing:
    set_name(x3[i,j,k], "x[$i,$j,$k]")
end

The semantic difference compared with the @variable constructor is that model[:x3] = Dict() won’t throw an error if there’s an existing object named :x3. Maybe we should have a “safer” interface for this.

Great! That should work, thanks a lot!

Perhaps a bit ironic if this is basically to bypass the error that I ran into in my very first attempt to do this with @variable, though. While I do understand that creating variable x would be considered an error if you already have x,y, and z defined, I wouldn’t consider creating x_{1,2} an error just because x_{1,1} is already defined. Maybe checking at that level of granularity is not practical in JuMP, perhaps optionally disabling that test would make sense?

Using anonymous variables is precisely what’s recommended if you don’t want an error when creating two variables with the same symbolic name.

You should read @variable(m, x[S]) as creating a set of variables named x indexed over S. You can’t have two sets of variables with the same name (as far as JuMP is aware of the name), and the index sets must be concretely specified when the variables are created so that JuMP can create x using the correct container type.

I see, the point I was trying to make was that from a user point of view, it’s not a logical error to add another variable with the same name but different index, so it might be a bit puzzling when this generates an error (for implementation reasons).

I’ll stop whining about this, there are good work-arounds, but the bottom line is that this is a rather common use-case in my experience, so having a performant and easy syntax in the vein of @variable(model, x[i,j,k] for (i,j,k) in S) would be even better (not considering how difficult that would be to implement).