Lagragian multipliers in JuMP

I appreciate questions about Lagrange multipliers in Jump have been discussed before, here the sign change is described. However, I’m having trouble reproducing example 6.32 (page 163) in Nonlinear and mixed-integer optimization : fundamentals and applications. This problem is as follows

using JuMP
using Ipopt

model = Model(Ipopt.Optimizer)

@variable(model, 0 ≤ A[2:3])
@variable(model, 0 ≤B[1:3])
@variable(model, 0 ≤C)
@variable(model, y[1:3], binary=false)

@NLconstraint(model,b2con, B[2] - log(1+A[2])==0)
@NLconstraint(model, b3con,B[3] - 1.2*log(1+A[3])==0)
@constraint(model, C-0.9*(B[1]+B[2]+B[3])==0)

@constraint(model, C-y[1] ≤0)
@constraint(model, B[2] - (1/0.9)*y[2] ≤0)
@constraint(model, B[3] - (1/0.9)*y[3] ≤0)
@constraint(model, y[2]+y[3] ≤1)

@objective(model, Min, -11C + 7B[1] + B[2] 
+ 1.2B[3]+1.8A[2]+1.8A[3]
+ 3.5y[1]+y[2]+1.5y[3])

fix(y[1], 1.0, force=true)
fix(y[2], 1.0, force=true)
fix(y[3], 0.0, force=true)
optimize!(model)

The book suggests we should have the multipliers [5.46792, 0] for the nonlinear constraints. However I get

julia> dual.(all_nonlinear_constraints(model))
2-element Vector{Float64}:
 -5.467917201544272
 -1.382785873446931

The first value is fine as this is a minimisation problem we expect it to be nonpositive. But why is the second Lagrangians value different? I get the same objective value as the book and all my variables have the same values as the book. As the constraints are non-linear Ipopt nor SCIP are able to provide the shadow price.

For another iteration the book sets y=[1,0,1] with this I get a similar result, my \lambda_1 is negative when I expect 0 and my \lambda_2 = -\lambda^{\text{book}}_2 .

Apologies if the issue is simply my lack of understanding.

1 Like

I don’t have access to the book so I can’t offer much advice. However are they equivalent solutions?

The problem is likely that b3con is an equality constraint containing B[3], but B[3] takes the value near 0.0 in an optimal solution and B[3] >= 0. So there is also a dual variable on the >= 0 constraint, which you can access with dual(LowerBoundRef(B[3])).

So you’ll likely get different dual solutions depending on small differences in the numerical primal solution found by the solver. You can see this by setting set_optimizer_attribute(model, "tol", 1e-10) to different values and looking at the resulting duals.

2 Likes
Book page.

2 Likes

What’s interesting is, following Computing Hessians · JuMP

julia> model = Model(Ipopt.Optimizer)
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: EMPTY_OPTIMIZER
Solver name: Ipopt

julia> begin
           set_silent(model)
           @variable(model, 0 ≤ A[2:3])
           @variable(model, 0 ≤B[1:3])
           @variable(model, 0 ≤C)
           @variable(model, y[1:3], binary=false)
           @NLconstraint(model,b2con, B[2] - log(1+A[2])==0)
           @NLconstraint(model, b3con,B[3] - 1.2*log(1+A[3])==0)
           @constraint(model, C-0.9*(B[1]+B[2]+B[3])==0)
           @constraint(model, C-y[1] ≤0)
           @constraint(model, B[2] - (1/0.9)*y[2] ≤0)
           @constraint(model, B[3] - (1/0.9)*y[3] ≤0)
           @constraint(model, y[2]+y[3] ≤1)
           @objective(model, Min, -11C + 7B[1] + B[2] 
           + 1.2B[3]+1.8A[2]+1.8A[3]
           + 3.5y[1]+y[2]+1.5y[3])
           fix(y[1], 1.0, force=true)
           fix(y[2], 1.0, force=true)
           fix(y[3], 0.0, force=true)
           optimize!(model)
       end

julia> function compute_optimal_hessian(model)
           d = NLPEvaluator(model)
           MOI.initialize(d, [:Hess])
           hessian_sparsity = MOI.hessian_lagrangian_structure(d)
           I = [i for (i, _) in hessian_sparsity]
           J = [j for (_, j) in hessian_sparsity]
           V = zeros(length(hessian_sparsity))
           x = all_variables(model)
           x_optimal = value.(x)
           y_optimal = dual.(all_nonlinear_constraints(model))
           MOI.eval_hessian_lagrangian(d, V, x_optimal, 1.0, y_optimal)
           n = num_variables(model)
           return SparseArrays.sparse(I, J, V, n, n)
       end
compute_optimal_hessian (generic function with 1 method)

julia> H_star = compute_optimal_hessian(model)
9×9 SparseArrays.SparseMatrixCSC{Float64, Int64} with 2 stored entries:
 -0.592547    ⋅        ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
   ⋅        -1.65934   ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
   ⋅          ⋅        ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
   ⋅          ⋅        ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
   ⋅          ⋅        ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
   ⋅          ⋅        ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
   ⋅          ⋅        ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
   ⋅          ⋅        ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 
   ⋅          ⋅        ⋅    ⋅    ⋅    ⋅    ⋅    ⋅    ⋅ 

julia> value(C)
1.000000009290023

julia> value.(B)
3-element Vector{Float64}:
 -1.4572662711616068e-10
  1.1111111121624373
  9.41664807677756e-9

julia> value.(A)
1-dimensional DenseAxisArray{Float64,1,...} with index sets:
    Dimension 1, 2:3
And data, a 2-element Vector{Float64}:
 2.0377317807110282
 7.847206806204714e-9

julia> objective_value(model)
-1.7209717603431018

So Ipopt converges to the optimal point, but the Hessian is negative semidefinite.

I think it’s just because the model is very badly designed. There are lots of redundant variables and constraints.

For example, if y[3] = 0, then B[3] = 0, which forces A[3] = 0. Similarly, B[2] >= 0 is redundant if B[2] = log(1 + A[2]) where A[2] >= 0. With a little bit of algebra, you can show that the entire model simplifies to:

julia> begin
           model = Model(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, A >= 0)
           @NLconstraint(model, 0.9 * log(1 + A) <= 1)
           @NLexpressions(model, begin
               B₂, log(1 + A)
               B₁, 1 - 0.9 * B₂
               C, 0.9 * (B₁ + B₂)
           end)
           @NLobjective(model, Min, -11C + 7B₁ + B₂ + 1.8A + 4.5)
           optimize!(model)
       end

julia> value(C), value(B₁), value(B₂), value(A), objective_value(model)
(1.0000000007201448, -7.201449214733202e-9, 1.1111111191127212, 2.037731801824228, -1.7209716959354049)

julia> H_star = compute_optimal_hessian(model)
1×1 SparseArrays.SparseMatrixCSC{Float64, Int64} with 1 stored entry:
 0.770722

and now Ipopt finds a local minimum.

In fact, it simplifies even further, to

julia> begin
           model = Model(Ipopt.Optimizer)
           set_silent(model)
           @variable(model, A >= 0)
           @NLconstraint(model, 0.9 * log(1 + A) <= 1)
           @NLobjective(model, Min, -6.29 * log(1 + A) + 1.8 * A + 1.6)
           optimize!(model)
       end

julia> value(A), objective_value(model)
(2.037731801824228, -1.7209716959354062)

and you can see that the underlying problem is indeed convex.

Summarized: your model, as originally written, is non-convex (it has nonlinear equality constraints). Ipopt assumes the problem is convex, so it can converge to a stationary point instead of a local minima. In this case, it still converges to the optimal primal point, but the duals are off. In the general case, user beware.

2 Likes

image

2 Likes

Thank you for your explanation.

I presume if this was Linear and I used a global solver, e.g. Gurobi this wouldn’t be an issue?

When I use this method I’ll heed this advice.

Yes. But also, if the problem is convex, Ipopt will also find a global optima.

1 Like