How to speed up ipopt with HSL ma86 using parallel processing

Chi_Jan · July 28, 2024, 1:32pm

Hi community, I was previously using IPOPT with MUMPS (which is the default) to solve my problem, but the performance was a bit slow. I obtained an academic license for HSL and successfully downloaded and ran the code as the documentation by switch linear_solver to ma86 by set_attribute(model, "linear_solver", "ma86"). However, I haven’t seen much improvement in performance. I noticed that MUMPS by default supports only one thread, whereas MA86 supports parallel processing as described. Also, I haven’t seen much change in CPU usage. Did I miss something? How should I set it up to make MA86 utilize all my cores?
When I try to set the NumberOfThreads attribute for IPOPT, it seems that this setting is not supported. Meanwhile, I also tried using SPRAL, but like with MA86, I didn’t notice a significant improvement, and the CPU usage didn’t change noticeably either.
Additionally, I noticed that building the model requires a very large amount of memory (around 30GB) and takes quite a long time (about an hour) to build a model for sovler. What could be the potential reasons for this? I tried some of the suggestions from the performance tips, such as setting add_bridges=false, but did not see a significant improvement. What may I pay attention to during modeling to avoid these situations?

odow · July 28, 2024, 2:59pm

Can you provide a reproducible example? Sn hour to build the model is excessive. There is almost certainly something that can be improved.

amontoison · July 29, 2024, 4:27am

Hi @Chi_Jan!
I’m glad that you use libHSL and HSL_jll.jl.
The linear solvers MA27 and MA57 are generally more efficient than the other ones but if you want to use MA86 and MA97, you must be careful to not use it with multithreaded BLAS / LAPACK:

Chi_Jan · July 29, 2024, 6:16am

Hi, here is a minimal runnable code of my problem. The actual data scale in my use case will be larger.

using JuMP, Ipopt
using IterTools: product

F = ["f1", "f2", "f3", "f4", "f5", "f6", "f7"]
R = ["A11", "A12", "A13", "A21", "A22", "A23", "A31", "A32", "A33", "B11", "B12", "B13", "B21", "B22", "B23", "B31", "B32", "B33", "C11", "C12", "C13", "C21", "C22", "C23", "C31", "C32", "C33"]
I = ["I1", "I2", "I3"]
T = ["T1", "T2", "T3"]
D = ["D1", "D2", "D3"]

M1 = Dict((r, i, t, d) => rand() for (r, i, t, d) in product(R, I, T, D))
M2 = Dict((r, i, t, d, f) => rand() for (r, i, t, d, f) in product(R, I, T, D, F))
M3 = Dict((r, i, t, d, f) => rand() for (r, i, t, d, f) in product(R, I, T, D, F))
M4 = Dict((r, i, t, d, f) => rand() for (r, i, t, d, f) in product(R, I, T, D, F))
candidate_list = Dict((r, i, t, d) => [f for f in F if rand() > 0.1] for (r, i, t, d) in product(R, I, T, D))

RI_lower = Dict((f, r, i) => rand(0:5) / 10 for f in F, r in R, i in I)
RI_upper = Dict((f, r, i) => rand(5:10) / 10 for f in F, r in R, i in I)
RA_lower = Dict((f, r) => rand(0:5) / 10 for f in F, r in R)
RA_upper = Dict((f, r) => rand(5:10) / 10 for f in F, r in R)
IS_lower = Dict((f, i) => rand(0:5) / 10 for f in F, i in I)
IS_upper = Dict((f, i) => rand(5:10) / 10 for f in F, i in I)
P_lower = Dict((f) => rand(0:5) / 10 for f in F)
P_upper = Dict((f) => rand(5:10) / 10 for f in F)
x_avg_upper = Dict((f) => rand(0:5) / 10 for f in F)
x_avg_lower = Dict((f) => rand(5:10) / 10 for f in F)


Containers.@container(matrix_1[r=R, i=I, t=T, d=D], get(M1, (r, i, t, d), 0))
Containers.@container(matrix_2[r=R, i=I, t=T, d=D, f=F], get(M2, (r, i, t, d, f), 0))
Containers.@container(matrix_3[r=R, i=I, t=T, d=D, f=F], get(M3, (r, i, t, d, f), 0))
Containers.@container(matrix_4[r=R, i=I, t=T, d=D, f=F], get(M4, (r, i, t, d, f), 0))

model = Model(Ipopt.Optimizer)
set_attribute(model, "max_iter", 1)
set_attribute(model, "print_level", 5)
# set_attribute(model, "linear_solver", "spral")

@variable(model, 0 <= decision_var[f=F, r=R, i=I, t=T, d=D] <= 1, start = 0.005)


for f = F, r = R, i = I, t = T, d = D
    if length(candidate_list[r, i, t, d]) == 0
        @constraint(model, decision_var[f, r, i, t, d] == 0)
        continue
    end

    if f == "NULL" || f in candidate_list[r, i, t, d]
        continue
    end
    @constraint(model, decision_var[f, r, i, t, d] == 0)
end

for r = R, i = I, t = T, d = D
    if length(candidate_list[r, i, t, d]) == 0
        continue
    end
    @constraint(model, 0.99 <= sum([decision_var[f, r, i, t, d] for f in candidate_list[r, i, t, d]]) <= 1.01)
end

for f in F
    if f == "NULL"
        continue
    end
    B = sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] * matrix_2[r, i, t, d, f] for (r, i, t, d) in product(R, I, T, D)])
    A = sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] * matrix_2[r, i, t, d, f] * matrix_3[r, i, t, d, f] for (r, i, t, d) in product(R, I, T, D)])
    C = sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] * matrix_2[r, i, t, d, f] * matrix_4[r, i, t, d, f] for (r, i, t, d) in product(R, I, T, D)])
    AB = A / B
    CB = C / B
    expr = AB * (1 - CB) - CB
    @constraint(model, P_lower[f] <= expr <= P_upper[f])

end

@expression(model, x_avg[f in F], sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] for (r, i, t, d) in product(R, I, T, D)]))
FF = [f for f in F if f != "NULL"]
total_x_avg = sum(x_avg)
@constraints(model, begin
    [f in F], x_avg[f] - x_avg_upper[f] * total_x_avg <= 0
    [f in FF], x_avg[f] - x_avg_lower[f] * total_x_avg >= 0
end)

@expression(model, IS[f=F, i=I], sum([decision_var[f, r, ii, t, d] * matrix_1[r, ii, t, d] for (r, ii, t, d) in product(R, I, T, D) if ii == i]))
@constraints(model, begin
    [(f, i) in product(FF, I)], IS[f, i] - sum(IS[f, :]) * get(IS_upper, (f, i), 1) <= 0
    [(f, i) in product(FF, I)], IS[f, i] - sum(IS[f, :]) * get(IS_lower, (f, i), 0) >= 0
end)


@expression(model, RA[f=F, rr=["A", "B", "C"]], sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] for r in R if occursin(rr, r) for i in I for t in T for d in D]))
for f in F
    if f == "NULL"
        continue
    end
    for r in R
        if occursin("A", r)
            @constraints(model, begin
                RA[f, "A"] - sum(RA[f, :]) * get(RA_upper, (f, "A"), 1) <= 0
                RA[f, "A"] - sum(RA[f, :]) * get(RA_lower, (f, "A"), 0) >= 0
            end)
        elseif occursin("B", r)
            @constraints(model, begin
                RA[f, "B"] - sum(RA[f, :]) * get(RA_upper, (f, "B"), 1) <= 0
                RA[f, "B"] - sum(RA[f, :]) * get(RA_lower, (f, "B"), 0) >= 0
            end)
        elseif occursin("C", r)
            @constraints(model, begin
                RA[f, "C"] - sum(RA[f, :]) * get(RA_upper, (f, "C"), 1) <= 0
                RA[f, "C"] - sum(RA[f, :]) * get(RA_lower, (f, "C"), 0) >= 0
            end)
        end
    end
end

@expression(model, RI[f=F, rr=["A", "B", "C"], i=I], sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] for r in R for t in T for d in D if occursin(rr, r)]))
for f in F
    if f == "NULL"
        continue
    end
    for i in I
        @constraints(model, begin
            RI[f, "C", i] <= sum(RI[f, :, i]) * get(RI_upper, (f, "C", i), 1)
            RI[f, "B", i] <= sum(RI[f, :, i]) * get(RI_upper, (f, "B", i), 1)
            RI[f, "A", i] <= sum(RI[f, :, i]) * get(RI_upper, (f, "A", i), 1)

            RI[f, "C", i] >= sum(RI[f, :, i]) * get(RI_lower, (f, "C", i), 0)
            RI[f, "B", i] >= sum(RI[f, :, i]) * get(RI_lower, (f, "B", i), 0)
            RI[f, "A", i] >= sum(RI[f, :, i]) * get(RI_lower, (f, "A", i), 0)
        end)
    end
end





function obj_fn()
    total_abs_bias = 0
    total = sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] * matrix_2[r, i, t, d, f] for f in F if f != "NULL" for (r, i, t, d) in product(R, I, T, D)])
    for f in F
        if f == "NULL"
            continue
        end
        B = sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] * matrix_2[r, i, t, d, f] for (r, i, t, d) in product(R, I, T, D)])
        A = sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] * matrix_2[r, i, t, d, f] * matrix_3[r, i, t, d, f] for (r, i, t, d) in product(R, I, T, D)])
        C = sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] * matrix_2[r, i, t, d, f] * matrix_4[r, i, t, d, f] for (r, i, t, d) in product(R, I, T, D)])
        AB = A / B
        CB = C / B
        IBR = AB * (1 - CB) - CB
        weight = B / total
        bias = IBR - (P_lower[f] * 0.5 + P_upper[f] * 0.5)
        total_abs_bias += weight * abs(bias)
    end
    return total_abs_bias
end
obj = obj_fn()

@objective(model, Min, obj)
@time optimize!(model)

here is the log of IPOPT


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.14, running with linear solver MUMPS 5.6.2.

Number of nonzeros in equality constraint Jacobian...:      524
Number of nonzeros in inequality constraint Jacobian.:   254383
Number of nonzeros in Lagrangian Hessian.............: 14885451

Total number of variables............................:     5103
                     variables with only lower bounds:        0
                variables with lower and upper bounds:     5103
                     variables with only upper bounds:        0
Total number of equality constraints.................:      524
Total number of inequality constraints...............:     1296
        inequality constraints with only lower bounds:      280
   inequality constraints with lower and upper bounds:      736
        inequality constraints with only upper bounds:      280

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  8.2302015e-01 2.21e+01 4.54e+00  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  8.2295431e-01 2.21e+01 1.11e+02  -1.0 2.18e+01    -  1.48e-02 1.41e-04h  1

Number of Iterations....: 1

                                   (scaled)                 (unscaled)
Objective...............:   8.2295430663584179e-01    8.2295430663584179e-01
Dual infeasibility......:   1.1124239213227227e+02    1.1124239213227227e+02
Constraint violation....:   2.2126797171992425e+01    2.2126797171992425e+01
Variable bound violation:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   6.5691602361597337e+00    6.5691602361597337e+00
Overall NLP error.......:   1.1124239213227227e+02    1.1124239213227227e+02


Number of objective function evaluations             = 2
Number of objective gradient evaluations             = 2
Number of equality constraint evaluations            = 2
Number of inequality constraint evaluations          = 2
Number of equality constraint Jacobian evaluations   = 2
Number of inequality constraint Jacobian evaluations = 2
Number of Lagrangian Hessian evaluations             = 1
Total seconds in IPOPT                               = 16.370

EXIT: Maximum Number of Iterations Exceeded.
620.543634 seconds (209.70 M allocations: 19.269 GiB, 9.54% gc time, 0.22% compilation time: 7% of which was recompilation)

Chi_Jan · July 29, 2024, 6:25am

Based on the description in the table, shouldn’t MA86 be faster due to its higher degree of parallelism?

amontoison · July 29, 2024, 6:42am

It depends on the size of your problems and the kind of parallelism you can exploit.

If your linear systems are highly sparse, such as ACOPF problems, you may not gain any benefit from using a solver other than MA27 (the oldest one) and could waste time trying to optimize block operations and exploit non-existent parallelism.

MA57 is designed to use BLAS3 operations and employs very efficient low-level kernels that can leverage CPU vendor SIMD acceleration.

Ultimately, the question is whether you can amortize the extra overhead of parallelism in linear solvers (such as launching threads in MA86) with your problems.

Different forms of parallelism have varying latencies:
No parallelism < SIMD < Multi-threaded < MPI

Chi_Jan · July 29, 2024, 11:37am

Thank you for your explanation. It has given me a basic understanding of these solvers. I will try using MA57 on my problem to see if it is more efficient than MA86.

odow · July 30, 2024, 2:42am

I thought this model looked familiar:

Your build issue is constraints like this:

for f in F
    if f == "NULL"
        continue
    end
    B = sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] * matrix_2[r, i, t, d, f] for (r, i, t, d) in product(R, I, T, D)])
    A = sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] * matrix_2[r, i, t, d, f] * matrix_3[r, i, t, d, f] for (r, i, t, d) in product(R, I, T, D)])
    C = sum([decision_var[f, r, i, t, d] * matrix_1[r, i, t, d] * matrix_2[r, i, t, d, f] * matrix_4[r, i, t, d, f] for (r, i, t, d) in product(R, I, T, D)])
    AB = A / B
    CB = C / B
    expr = AB * (1 - CB) - CB
    @constraint(model, P_lower[f] <= expr <= P_upper[f])
end

JuMP doesn’t (currently) exploit repeated subexpressions, so even though you have only 5103 variables, there are 14,885,451 terms in the Hessian. That’s a lot!

Topic		Replies	Views
Identifying the systems available linear solvers (With JuMP & Ipopt) Optimization (Mathematical) jump	5	2272	May 24, 2020
How to get HSL up and running with Ipopt Optimization (Mathematical) ipopt	6	561	May 11, 2024
Using HSL library for IPOPT in PowerModels.jl Optimization (Mathematical) question	4	470	May 27, 2022
Needs help on [IPOPT + coinhsl] installation Optimization (Mathematical)	5	1878	March 26, 2021
MUMPS returned INFO(1) = -9 and requires more memory, reallocating Optimization (Mathematical) question	3	350	April 3, 2024

How to speed up ipopt with HSL ma86 using parallel processing

Related topics