Best Nonlinear Optimizer for a Continuous Convex Problem

How about this?

using JuMP, Ipopt, LinearAlgebra, Random, Gurobi, BenchmarkTools, MosekTools, NLopt

const Ι = 250  #This is Greek Letter Iota
const J = 50
const p = 5
const k = 0.75

function distmat(J,Ι)
    Distances = zeros(J,Ι)
    Random.seed!(7777)
    coordinates_x = hcat([x for x = 1:J],fill(0.0,J))
    coordinates_y = hcat([x for x = 1:Ι].*J/Ι,fill(0.0,Ι))
    for j = 1:J, l=1:Ι
        Distances[j,l] = sqrt((coordinates_x[j,1]-coordinates_y[l,1])^2+(coordinates_x[j,2]-coordinates_y[l,2])^2)
    end
    return 1 .+ .1*Distances
end

const t = distmat(J,Ι)



function func(x) 
    res=sum( sum(x[j,i]  for i = 1:Ι)
    ^((p-1)/p)  for j = 1:J) - sum(  sum(  t[j,i]*x[j,i]    for j =1:J )^(1/k)     for i = 1:Ι) 
end



function gradient!(x,g) 
    X = sum(x,dims=2)
    Y = sum(t.*x,dims=1)'
    res = sum(X.^((p-1)/p))   - sum(Y.^(1/k))
    grad = zeros(J,Ι)

    for i= 1:Ι
        for j = 1:J
            grad[j,i] = (p-1)/p * X[j]^(-1/p) - t[j,i]/k * Y[i]^((1-k)/k)
        end
    end  

    g = vec(grad)

end


x0 = rand(J,Ι)

func(x0)

gradient!(x0,zeros(J*Ι))




1 Like

Thanks for the details. A couple of questions:

  • why did you disable function scaling (the nlp_scaling_method parameter)?
  • what does the optimal x^* look like? Lots of (near) 0 as expected?
  • for a given instance, do you have an a priori idea of what the solution is going to look like?
  • edit: can you post the log of Ipopt, where we see how the objective value evolves over the iterations?
2 Likes

Another thing to try is using a different linear solver. HSL etc can be much faster than mumps: https://github.com/jump-dev/Ipopt.jl#linear-solvers

3 Likes

Sure,
a) I am not sure why I am disabling the scaling. It is probably a legacy of a recycled code. However, if I comment that line of code the solution to the problem is garbage.

b) Yes, lot’s of 0s. Of the 12,500 entries, 98% are 0s.

c) I am not sure how to answer this question. I can characterize the solution sharply (in terms of getting a closed form) only if I assume knife-edge cases for the parameter k. I can characterize a little bit more sharply the sum of the X’s either across j or across i.

d) Yes, see below

This is Ipopt version 3.13.4, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:  1875000

Total number of variables............................:    12500
                     variables with only lower bounds:    12500
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0 -2.6440765e+02 0.00e+00 7.97e+00  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1 -6.0569020e+02 0.00e+00 4.20e+00  -1.0 8.24e-02    -  1.00e+00 3.61e-01f  1
   2 -7.7536695e+02 0.00e+00 4.24e-02  -1.0 2.68e-02    -  1.00e+00 1.00e+00f  1
   3 -1.2437166e+02 0.00e+00 1.13e+00  -2.5 3.81e-02    -  1.00e+00 1.00e+00f  1
   4 -3.2839354e+01 0.00e+00 3.30e-01  -2.5 7.17e-03    -  1.00e+00 1.00e+00f  1
   5 -1.1183212e+01 0.00e+00 8.80e-02  -2.5 4.78e-03    -  1.00e+00 1.00e+00f  1
   6 -8.0100638e+00 0.00e+00 2.32e-03  -2.5 2.33e-02    -  1.00e+00 1.00e+00f  1
   7  1.0134570e+01 0.00e+00 7.14e-01  -3.8 7.75e-02    -  4.02e-01 1.00e+00f  1
   8  1.5443845e+01 0.00e+00 4.48e-01  -3.8 1.02e-01    -  3.63e-01 1.00e+00f  1
   9  1.7675529e+01 0.00e+00 1.30e-01  -3.8 8.33e-02    -  5.69e-01 1.00e+00f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10  1.7987258e+01 0.00e+00 8.56e-04  -3.8 5.52e-02    -  1.00e+00 1.00e+00f  1
  11  1.9582324e+01 0.00e+00 2.92e-03  -5.7 2.99e-02    -  8.28e-01 8.88e-01f  1
  12  1.9798199e+01 0.00e+00 1.97e-05  -5.7 8.99e-03    -  1.00e+00 1.00e+00f  1
  13  1.9798484e+01 0.00e+00 2.09e-06  -5.7 4.92e-03    -  1.00e+00 1.00e+00f  1
  14  1.9820655e+01 0.00e+00 1.48e-05  -8.6 2.94e-03    -  9.76e-01 9.83e-01f  1
  15  1.9821050e+01 0.00e+00 1.74e-07  -8.6 1.50e-03    -  1.00e+00 1.00e+00f  1
  16  1.9821051e+01 0.00e+00 3.57e-08  -8.6 6.87e-04    -  1.00e+00 1.00e+00f  1
  17  1.9821051e+01 0.00e+00 4.37e-09  -8.6 2.41e-04    -  1.00e+00 1.00e+00f  1
  18  1.9821081e+01 0.00e+00 4.56e-09 -12.9 4.94e-05    -  1.00e+00 1.00e+00f  1

Number of Iterations....: 18

                                   (scaled)                 (unscaled)
Objective...............:  -1.9821081353715069e+01    1.9821081353715069e+01
Dual infeasibility......:   4.5645540680538943e-09    4.5645540680538943e-09
Constraint violation....:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   4.0657970261467833e-10   -4.0657970261467833e-10
Overall NLP error.......:   4.5645540680538943e-09    4.5645540680538943e-09


Number of objective function evaluations             = 19
Number of objective gradient evaluations             = 19
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 18
Total CPU secs in IPOPT (w/o function evaluations)   =     60.696
Total CPU secs in NLP function evaluations           =      1.161

EXIT: Optimal Solution Found.
19.82108135371507
 66.078193 seconds (4.56 M allocations: 801.528 MiB, 0.21% gc time)

You should definitely switch linear solver. It looks like almost all of the time is spent in that.

3 Likes

Thanks. I just sent the request. I am installing it on a Windows, so I’ll see how this goes.

Indeed. Only 19 iterations, so if you find a good linear solver (I can recommend MA57 and Pardiso) it really should speed up the process.

Interesting. Is there a pattern you can identify?
I would definitely follow my previous piece of advice: try an active-set solver (stuff like filterSQP, SNOPT, etc) and start from the (0, \ldots, 0) solution. With a bit of luck, it won’t have to do a lot of iterations to figure out the right active set.

3 Likes

Yes, there is a clear pattern. However, in my real problem I do not think this pattern is very useful.

Perfect, I will take a look and update you!

Got it. Maybe a coarse estimate can be a good enough initial point :slight_smile:

1 Like

Do you know if these have Julia implementations?

Your are right, for instance a good candidate point can be the one when the parameter k=1. In that case, for a given j there will be only one positive x_{ij}, which is the one that minimizes t_{ij} and can be computed analytically.

Edit: this did not work.

Another potential solution is recasting everything as a linear program with Power Cone constraints and using Mosek in the following way

using JuMP, Ipopt, LinearAlgebra, Random, Gurobi, BenchmarkTools, MosekTools, NLopt

const Ι = 250  #This is Greek Letter Iota
const J = 50
const p = 5.0
const k = 0.75

function distmat(J,Ι)
    Distances = zeros(J,Ι)
    Random.seed!(7777)
    coordinates_x = hcat([x for x = 1:J],fill(0.0,J))
    coordinates_y = hcat([x for x = 1:Ι].*J/Ι,fill(0.0,Ι))
    for j = 1:J, l=1:Ι
        Distances[j,l] = sqrt((coordinates_x[j,1]-coordinates_y[l,1])^2+(coordinates_x[j,2]-coordinates_y[l,2])^2)
    end
    return 1 .+ .1*Distances
end

const t = distmat(J,Ι)

function solution_numerical_Mosek(p,k,t)
    opt = Model(Mosek.Optimizer)
    #set_silent(opt)
    @variable(opt, x[1:J,1:Ι] >= 0)
    @variable(opt, Q[1:J] >= 0)
    @variable(opt, Y[1:Ι] >= 0)
    @variable(opt, u[1:J] >= 0)
    @variable(opt, v[1:Ι] >= 0)
    @objective(
        opt,
        Min,
        -sum(u) + sum(v)  
    )
    @constraint(
        opt,
        [j = 1:J],
        sum(x[j,:])==Q[j],
    )
    @constraint(
        opt,
        [i = 1:Ι],
        sum(t[:,i].*x[:,i])==Y[i],
    )
    @constraint(
        opt,
        [j = 1:J],
        [Q[j],  1.0, u[j]] in MOI.PowerCone((p-1)/p)
    )
    @constraint(
        opt,
        [i = 1:Ι],
        [v[i], 1.0, Y[i]] in MOI.PowerCone(k)
    )
    JuMP.optimize!(opt)
    return objective_value(opt), value.(x), value.(u), value.(v), value.(Q), value.(Y)
end

With very similar optimal value

Ipopt:

19.820920746692256

and with Mosek:

-19.820921011259458

and output:

Problem 
  Name                   :
  Objective sense        : min
  Type                   : CONIC (conic optimization problem)
  Constraints            : 1200
  Cones                  : 300
  Scalar variables       : 14000
  Matrix variables       : 0
  Integer variables      : 0

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00
Lin. dep.  - tries                  : 1                 time                   : 0.00
Lin. dep.  - number                 : 0
Presolve terminated. Time: 0.02
Problem
  Name                   :
  Objective sense        : min
  Type                   : CONIC (conic optimization problem)
  Constraints            : 1200
  Cones                  : 300
  Scalar variables       : 14000
  Matrix variables       : 0
  Integer variables      : 0

Optimizer  - threads                : 8
Optimizer  - solved problem         : the primal
Optimizer  - Constraints            : 300
Optimizer  - Cones                  : 300
Optimizer  - Scalar variables       : 13400             conic                  : 900
Optimizer  - Semi-definite variables: 0                 scalarized             : 0
Factor     - setup time             : 0.00              dense det. time        : 0.00
Factor     - ML order time          : 0.00              GP order time          : 0.00
Factor     - nonzeros before factor : 1.28e+04          after factor           : 1.40e+04
Factor     - dense dim.             : 0                 flops                  : 7.69e+05
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME
0   1.3e+00  1.3e+00  6.7e+02  0.00e+00   3.307189139e+02   -3.342807529e+02  1.0e+00  0.02
1   7.0e-01  7.0e-01  2.4e+02  1.69e+00   1.285879661e+02   -1.641718015e+02  5.2e-01  0.02  
2   3.4e-01  3.4e-01  1.7e+02  4.81e-01   -4.664481087e+01  -3.639480472e+02  2.6e-01  0.02
3   1.5e-01  1.5e-01  1.9e+01  8.20e-01   -8.572806511e+01  -1.942027062e+02  1.1e-01  0.03
4   4.5e-02  4.5e-02  3.2e+00  6.70e-01   -6.992520893e+01  -1.044985663e+02  3.3e-02  0.03
5   2.0e-02  2.0e-02  1.2e+00  1.05e+00   -3.341086702e+01  -4.843598443e+01  1.5e-02  0.03
6   6.3e-03  6.3e-03  2.0e-01  1.29e+00   -2.647315351e+01  -3.067496218e+01  4.7e-03  0.03
7   2.5e-03  2.5e-03  5.2e-02  1.23e+00   -2.115278755e+01  -2.264358240e+01  1.8e-03  0.03  
8   5.1e-04  5.1e-04  5.8e-03  1.03e+00   -2.027652248e+01  -2.058739853e+01  3.8e-04  0.03
9   1.0e-04  1.0e-04  5.2e-04  1.00e+00   -1.991210073e+01  -1.997349925e+01  7.6e-05  0.05
10  1.2e-05  1.2e-05  2.1e-05  9.99e-01   -1.983120268e+01  -1.983837079e+01  8.8e-06  0.05
11  1.3e-06  1.3e-06  7.6e-07  1.00e+00   -1.982204670e+01  -1.982284111e+01  9.8e-07  0.05
12  1.6e-07  1.6e-07  3.3e-08  1.00e+00   -1.982105752e+01  -1.982115461e+01  1.2e-07  0.05
13  2.8e-08  2.7e-08  2.3e-09  1.00e+00   -1.982094383e+01  -1.982096048e+01  2.0e-08  0.05
14  8.7e-09  3.6e-09  1.1e-10  1.00e+00   -1.982092376e+01  -1.982092593e+01  2.7e-09  0.05  
15  2.2e-08  5.2e-10  6.2e-12  1.00e+00   -1.982092119e+01  -1.982092150e+01  3.9e-10  0.06
16  3.9e-09  5.2e-10  6.1e-12  1.00e+00   -1.982092118e+01  -1.982092150e+01  3.9e-10  0.06  
17  1.7e-08  5.0e-10  5.8e-12  1.00e+00   -1.982092117e+01  -1.982092148e+01  3.7e-10  0.08
18  1.7e-08  5.0e-10  5.8e-12  1.00e+00   -1.982092117e+01  -1.982092148e+01  3.7e-10  0.08
19  1.7e-08  5.0e-10  5.7e-12  9.99e-01   -1.982092117e+01  -1.982092147e+01  3.7e-10  0.08
20  2.0e-08  4.9e-10  5.5e-12  1.00e+00   -1.982092116e+01  -1.982092145e+01  3.6e-10  0.08  
21  2.0e-08  4.9e-10  5.5e-12  9.98e-01   -1.982092116e+01  -1.982092145e+01  3.6e-10  0.09
22  2.3e-08  4.9e-10  5.5e-12  9.97e-01   -1.982092116e+01  -1.982092145e+01  3.6e-10  0.09  
23  2.3e-08  4.8e-10  5.5e-12  9.98e-01   -1.982092115e+01  -1.982092145e+01  3.6e-10  0.11
24  1.5e-08  3.1e-10  2.9e-12  1.00e+00   -1.982092101e+01  -1.982092120e+01  2.3e-10  0.11
Optimizer terminated. Time: 0.11

  0.132441 seconds (655.88 k allocations: 52.483 MiB)

But it is 600x (?) faster.

EDIT: Typo in code corrected.

1 Like

there’s a reason people like conic optimization.

6 Likes

If you can reformulate as a conic problem, try Cosmo.jl, works pretty damn well usually, and the structure of the problem can be reused with another configuration (since you seem to want to solve many problem like this one).

If the fact that the solution is worse is a precision problem, COSMO will allow you to use higher preicsion such as DoubleFloats or BigFloats.

1 Like

Hi, thanks so much! I will definitely look into it.

The solution now is almost the same as the one Ipopt, I had a typo in the code

1 Like

Building in this same problem,

What is in the interpretation of the Lagrange multipliers that Jump.jl spits from the Conic constraints?

For instance, say I run this code

using JuMP, Ipopt, LinearAlgebra, Random, Gurobi, BenchmarkTools, MosekTools, NLopt

const Ι = 250  #This is Greek Letter Iota
const J = 50
const p = 5.0
const k = 0.75

function distmat(J,Ι)
    Distances = zeros(J,Ι)
    Random.seed!(7777)
    coordinates_x = hcat([x for x = 1:J],fill(0.0,J))
    coordinates_y = hcat([x for x = 1:Ι].*J/Ι,fill(0.0,Ι))
    for j = 1:J, l=1:Ι
        Distances[j,l] = sqrt((coordinates_x[j,1]-coordinates_y[l,1])^2+(coordinates_x[j,2]-coordinates_y[l,2])^2)
    end
    return 1 .+ .1*Distances
end

const t = distmat(J,Ι)


function solution_numerical_Mosek(p,k,t)

    opt = Model(Mosek.Optimizer)

    #set_silent(opt)

    @variable(opt, x[1:J,1:Ι] >= 0)

    @variable(opt, Q[1:J] >= 0)

    @variable(opt, Y[1:Ι] >= 0)

    @variable(opt, u[1:J] >= 0)

    @variable(opt, v[1:Ι] >= 0)

    @objective(

        opt,

        Min,

        -sum(u) + sum(v)  

    )

    @constraint(

        opt,

        [j = 1:J],

        sum(x[j,:])==Q[j],

    )

    @constraint(

        opt,

        [i = 1:Ι],

        sum(t[:,i].*x[:,i])==Y[i],

    )

    @constraint(

        opt,

        [j = 1:J],

        [Q[j],  1.0, u[j]] in MOI.PowerCone((p-1)/p)

    )

    @constraint(

        opt,

        cons[i = 1:Ι],

        [v[i], 1.0, Y[i]] in MOI.PowerCone(k)

    )

    JuMP.optimize!(opt)

    return objective_value(opt), value.(x), value.(u), value.(v), value.(Q), value.(Y), dual.(cons)

end

sol_Mosek = @time solution_numerical_Mosek(p,k,t)

sol_Mosek[7]

Looking at the output, for the last array that the function produces, which are the duals for the last constraint. I get:

250-element Vector{Vector{Float64}}:
 [0.9999999851948024, 0.26745049727837067, -0.5742708009654043]
 [0.9999999855626119, 0.2737730991899862, -0.5851060990300552]
 [0.9999999859286929, 0.2803699276341491, -0.5963581393283741]
 [0.9999999862930139, 0.2872585002963847, -0.6080514361094072]
 [0.9999999866555548, 0.294457827370603, -0.6202124647621234]
 [0.9999999862930065, 0.2872585002868521, -0.6080514361094109]
 ⋮
 [0.9999999875163498, 0.3130998007092086, -0.6514307285877697]
 [0.9999999871825755, 0.3055915575179915, -0.638903238124342]
 [0.9999999875105084, 0.31295508254897175, -0.6511898386528795]
 [0.9999999878368773, 0.32064425895101034, -0.663958266795144]
 [0.9999999881616518, 0.32868030617677824, -0.6772374320635317]

Say that we take the second element in the first row? What does this mean? Is this the change in the objective function coming from of changing the 1.0 in that conic constraint infinitesimally?

I am checking the Mosek documentation here 8 Duality in conic optimization — MOSEK Modeling Cookbook 3.3.0 , and I understand how the dual cone is defined. However, I do not understand to what corresponds the value that Julia is giving me and what is the interpretation?

These are conic duals, not lagrange multipliers. See Duality · JuMP.

You’ll have to refer to a textbook for the full story, but here’s one perturbation interpretation.

A conic constraint:

A_i x + b_i \in C_i

has a dual multiplier y_i \in C_i^*. The dual objective has a term -b_i^Ty_i. This means that if y_i is a dual solution with objective value T and you perturb \tilde b_i = b_i + \epsilon then there exists a dual feasible solution with objective value T - \epsilon^Ty_i. Assuming minimization, if T is the optimal objective value and - \epsilon^Ty_i is positive then you learn that the optimal objective value must increase by at least - \epsilon^Ty_i in response to the perturbation.

2 Likes

Hi @miles.lubin, thanks so much for the answer. It is very clear conceptually. However, in terms of the output of ‘JuMP’ what is the interpretation of that triplet that I am getting for each constraint?

Trying to be more precise. In my MWE, for constraint ‘cons1’ I get


[0.9999999851948024, 0.26745049727837067, -0.5742708009654043]

To what does each of these numbers correspond to?

I could not find anything in the documentation that speaks to this.

Each vector is the dual solution of the corresponding power cone constraint.

But can the elements of this vector be interpreted as the perturbation of the corresponding entry on the power cone or not?

A better question might be: what do you want to do with the dual solution?

If you only care about the primal solution, you can ignore the dual solution. There’s no need to interpret it.