Multi-thread Variable Creation

I have a large problem; even model creation will take a long time, and I was thinking of using a thread-safe approach to speed up the process. Here is the code:

using Random
using JuMP
using HiGHS
using ThreadSafeDicts

struct Limit
    mn::Float64
    mx::Float64
end

I = 1:24
J = [randstring(12) for i in 1:60]
K = ["sample_"*string(i) for i in 1:100_000]
L = Dict()
X = ThreadSafeDict()
Y = Dict()

limits = Dict()
for i in I, j in J
    limits[i, j] = Limit(rand(), rand())
    L[i,j] = sort([rand(), rand(), rand()])
    Threads.@threads for k in K
        X[i, j, k] = rand()
    end
end

model = Model(HiGHS.Optimizer)

BO = Dict()
ABO = Dict()
for i in I, j in J
    mn = limits[i, j].mn
    mx = limits[i, j].mx
    m = max(mn, mx)
    l = L[i,j]

    BO[i, j, 1] = @variable(model, lower_bound = mn, upper_bound = 0.0)
    BO[i, j, 4] = @variable(model, lower_bound = 0.0, upper_bound = mx)
    for seg in 2:3
        BO[i, j, seg] = @variable(model, lower_bound = mn, upper_bound = mx)
    end

    ABO[i, j] = @variable(model, lower_bound = 0.0, upper_bound = mx)

    for k in K
        x = X[i, j, k]
        idx = 3
        while idx ≥ 1 && l[idx] > x
            idx -= 1
        end
        Y[i, j, k] = BO[i, j, idx+1]
    end
end

Overall performance is unacceptable compared to AMPL; even if I modified the code to use Channel, that would not help (julia> Threads.nthreads() 30)

using Random
using JuMP
using HiGHS
using ThreadSafeDicts

struct Limit
    mn::Float64
    mx::Float64
end

I = 1:24
J = [randstring(12) for i in 1:60]
K = ["sample_"*string(i) for i in 1:100_000]
L = Dict()
X = ThreadSafeDict()
Y = Dict()

limits = Dict()
for i in I, j in J
    limits[i, j] = Limit(rand(), rand())
    L[i,j] = sort([rand(), rand(), rand()])
    Threads.@threads for k in K
        X[i, j, k] = rand()
    end
end

model = Model(HiGHS.Optimizer)

BO = Dict()
ABO = Dict()
for i in I, j in J
    mn = limits[i, j].mn
    mx = limits[i, j].mx
    m = max(mn, mx)
    l = L[i,j]

    BO[i, j, 1] = @variable(model, lower_bound = mn, upper_bound = 0.0)
    BO[i, j, 4] = @variable(model, lower_bound = 0.0, upper_bound = mx)
    for seg in 2:3
        BO[i, j, seg] = @variable(model, lower_bound = mn, upper_bound = mx)
    end

    ABO[i, j] = @variable(model, lower_bound = 0.0, upper_bound = mx)

    for k in K
        x = X[i, j, k]
        idx = 3
        while idx ≥ 1 && l[idx] > x
            idx -= 1
        end
        Y[i, j, k] = BO[i, j, idx+1]
    end
end

I would appreciate any suggestions.

1 Like

What is the difference between the first and second code snippet?

There are a number of performance improvements you can implement; I would recommend you take a look at Performance Tips · The Julia Language for a more complete picture.

  • Put your code in a functions
  • Use arrays instead of dictionaries. It’s a LOT faster.
    For instance, running the first loop that populates X takes forever (I killed it after ~30s), but the equivalent array `X = rand(length(I), length(J), length(K)) takes less than a second.
  • Avoid untyped containers like Dict(): the compiler doesn’t know the type of keys & values, and cannot generate efficient code.
  • When building the JuMP model, create all variables at once instead of one by one.
1 Like

In general, you should be able to build very large problems without JuMP becoming the bottleneck. If you’re finding it slow, you can likely change how you’re modeling the problem without resorting to multithreading. (And, more importantly, JuMP doesn’t support multithreaded variable creation.)

For example, it seems like the bulk of your model could be:

using JuMP, HiGHS
I = 24
J = 60
MN = rand(I, J)
MX = 1 .+ rand(I, J)
model = Model(HiGHS.Optimizer)
@variable(model, (k == 4 ? 0 : MN[i, j]) <= BO[i=1:I, j=1:J, k=1:4] <= (k == 1 ? 0 : MX[i, j]))
@variable(model, 0 <= ABO[i=1:I, j=1:J] <= MX[i, j])

I don’t know that the K loop is doing. Why do you need to create Y?

Thank @mtanneau, for your recommendation.

  • I did use a function and typed containers; I didn’t copy them to make them simpler to understand.
  • I used the incremental method in SparseVariables - Efficient sparse modeling with JuMP | Lars Hellemo | JuliaCon 2022; that is why I used Dict and incremental variable creation.

Thank you @odow

As I mentioned, I tried to mimic SparseVariables - Efficient sparse modeling with JuMP | Lars Hellemo | JuliaCon 2022, but I will do as you said.

I created Y, which is RefVarable, to represent an equality constraint without creating a constraint.

@odow

I tried something like this, and it is very slow. Any recommendation? Thanks

@variable(model, (0 ≤ p[s=1:100000, i=1:2000] ≤10)

The model would have 2e8 variables. That is a very very large problem, and even if it can build you likely can’t find a solution to it in reasonable time.

Why such a large problem?

That is the problem size that I need to solve, I could solve it, and yes, it takes a long time to solve. I compared AMPL vs. JuMP again, and AMPL looks faster (like five times).

JuMP took like 40 minutes, using AMD EPYC 7B13 (112 threads).

That would be great if JuMP could utilize multi-threading in problem creation.

What solver? Did you use direct mode? Did you disable string names?

See Performance tips · JuMP