SLOW_PROGRESS or NUMERICAL_ERROR: a simple but nontrivial SDP example, solved with LP relaxation

I wonder why mosek cannot solve this simple problem to optimality (it reports SLOW_PROGRESS). x-ref [docs] improve the docstring for DUAL_INFEASIBLE by odow · Pull Request #2701 · jump-dev/MathOptInterface.jl · GitHub

import JuMP, MosekTools
model = JuMP.Model(MosekTools.Optimizer)
JuMP.@variable(model, y); JuMP.@variable(model, x);
JuMP.@constraint(model, [0 y; y x] in JuMP.PSDCone())
JuMP.@objective(model, Min, 2 * y)
JuMP.optimize!(model)
JuMP.assert_is_solved_and_feasible(model; allow_local = false) # ERROR

Actually, if you remove the objective—solving a feasibility system, then it works normal. But after only adding a linear objective, it fails.

I did another test with Clarabel, then it terminates with NUMERICAL_ERROR.

It appears that the feasibility system of the dual problem satisfied the 2nd case (Limit feasible) in the Conic Farkas’ Lemma.

I had a question in my mind

  • why does Gurobi has a parameter named PSDCuts but it do not allow users to model with PSD constraints.

Now I guess that it’s (might) because SDP may be prone to numerical issues.

I did 2 more tests where I add artificial bounds to x.
Please take a look at their results

import JuMP, Clarabel
model = JuMP.Model(Clarabel.Optimizer)
JuMP.@variable(model, y); JuMP.@variable(model, x);
JuMP.@constraint(model, [0 y; y x] in JuMP.PSDCone())
JuMP.@constraint(model, 0 <= x <= 9)
JuMP.@objective(model, Min, 2 * y)
JuMP.optimize!(model)
JuMP.assert_is_solved_and_feasible(model; allow_local = false) # ✅ no ERROR
JuMP.value(x) # 8.888332507030219
JuMP.value(y) # -2.119040588190556e-8

import JuMP, MosekTools # I used Mosek v10
model = JuMP.Model(MosekTools.Optimizer)
JuMP.@variable(model, y); JuMP.@variable(model, x);
JuMP.@constraint(model, [0 y; y x] in JuMP.PSDCone())
JuMP.@constraint(model, 0 <= x <= 9)
JuMP.@objective(model, Min, 2 * y)
JuMP.optimize!(model)
JuMP.assert_is_solved_and_feasible(model; allow_local = false) # ERROR: SLOW_PROGRESS

The following are 2 comparison test, in which Clarabel is OPTIMAL, while Mosek fails

import JuMP, MosekTools # I used Mosek v10
model = JuMP.Model(MosekTools.Optimizer)
JuMP.@variable(model, y); JuMP.@variable(model, x)
JuMP.@constraint(model, [x y; y 1] in JuMP.PSDCone())
JuMP.@objective(model, Min, 2 * y)
JuMP.optimize!(model)
JuMP.termination_status(model) # SLOW_PROGRESS

import JuMP, Clarabel
model = JuMP.Model(Clarabel.Optimizer)
JuMP.@variable(model, y); JuMP.@variable(model, x)
JuMP.@constraint(model, [x y; y 1] in JuMP.PSDCone())
JuMP.@objective(model, Min, 2 * y)
JuMP.optimize!(model)
JuMP.termination_status(model) # OPTIMAL

What I employ Mosek to solve a feasibility system, its tolerance looks like

import JuMP, MosekTools # I used Mosek v10
model = JuMP.Model(MosekTools.Optimizer)
JuMP.@variable(model, y); JuMP.@variable(model, x)
JuMP.@constraint(model, [x y; y 0] in JuMP.PSDCone())
JuMP.set_upper_bound(x, -0.000)
JuMP.optimize!(model)
JuMP.termination_status(model) # OPTIMAL
JuMP.set_upper_bound(x, -0.001)
JuMP.optimize!(model)
JuMP.termination_status(model) # SLOW_PROGRESS
JuMP.set_upper_bound(x, -0.010)
JuMP.optimize!(model)
JuMP.termination_status(model) # INFEASIBLE

I did another different test of optimization for clarabel, its tolerance looks like

julia> function print_termination_status(t)                                                                 
           model = JuMP.Model(Clarabel.Optimizer)                                                           
               JuMP.set_silent(model)                                                                       
           JuMP.@variable(model, y); JuMP.@variable(model, x)                                               
           JuMP.@constraint(model, [x y; y t] in JuMP.PSDCone())                                            
           JuMP.@objective(model, Min, 2 * y)                                                               
           JuMP.optimize!(model)                                                                            
           JuMP.termination_status(model) |> println                                                        
       end

julia> print_termination_status(0)
NUMERICAL_ERROR

julia> print_termination_status(1e-16)
NUMERICAL_ERROR

julia> print_termination_status(1e-15)
OPTIMAL

julia> print_termination_status(1e-4)
ALMOST_OPTIMAL

julia> print_termination_status(1e-2)
OPTIMAL

This isn’t really a numerical issue. It’s related to a much more fundamental issue: some conic programs do not satisfy strong duality. (This doesn’t happen in an LP; strong duality always holds there).

why does Gurobi … named PSDCuts but it do not allow users to model with PSD constraints.

Because the PSDCuts parameter controls the PSD nature of the Q matrix in a quadratic function x' Q x. It is unrelated to the concept of a variable matrix being PSD. Gurobi does not support PSD constraints.

Actually, it always holds if either one side of the two LP problems is feasible.

It seems I have a clearer understanding of this:

  1. Gurobi’s “PSDCut” is (potentially) added during an MIP solving process
  2. Although Gurobi call it “PSDCut”, it is actually not a full PSD constraint. Instead, it might just be a set of linear cuts.

Due to their poor performance on this simple test case, I propose to solve it via LP relaxation.
The crux, is to introduce a proper artificial bound.
After adding about 23 linear cuts, we converge to the global optimal value 0.0.

import JuMP, Gurobi, LinearAlgebra
function my_eigmin(A)
    valmin, vecmin = LinearAlgebra.eigen(A, 1:1)
    return first(valmin), vecmin[:, 1]
end
function optimise(model)
    JuMP.optimize!(model)
    JuMP.assert_is_solved_and_feasible(model; allow_local = false, dual = true)
end
ϵ = 1e-13 # tolerance on the smallest eigenvalue of a Symmetric matrix
ib = 9. # 🍏 pick an artificial initial bound
model = JuMP.Model(Gurobi.Optimizer) 
JuMP.@variable(model, -ib <= y <= ib); JuMP.@variable(model, -ib <= x <= ib)
JuMP.@expression(model, X, LinearAlgebra.Symmetric([0 y; y x]))
# JuMP.@constraint(model, X in PSDCone()) # We'll substitute this PSD cut by linear cuts
JuMP.@objective(model, Min, 2 * y)
for i in 0:typemax(Int) # iteratively tighten the LP relaxation
    optimise(model)
    lb = JuMP.objective_bound(model); @info "i = $i, lb = $lb"
    Xt = LinearAlgebra.Symmetric(JuMP.value.(X))
    valmin, t = my_eigmin(Xt)
    (valmin > -ϵ ? break : JuMP.@constraint(model, t' * X * t >= 0))
end
# during the above for-loop, there is only one (mild) Warning reported by Gurobi
# julia>  [ Info: i = 22, lb = -4.365443091908161e-6
# julia>  Warning for adding constraints: zero or small (< 1e-13) coefficients, ignored
optimise(model)
S = [
    -JuMP.dual(JuMP.UpperBoundRef(x)) -JuMP.dual(JuMP.UpperBoundRef(y)); 
    JuMP.dual(JuMP.LowerBoundRef(x))  JuMP.dual(JuMP.LowerBoundRef(y))
] # We observe that sensitivity == 0
# Hence we can delete our artificial bounds
JuMP.delete_upper_bound(x); JuMP.delete_upper_bound(y) 
JuMP.delete_lower_bound(x); JuMP.delete_lower_bound(y) 
optimise(model) # then we reoptimize
lb = JuMP.objective_bound(model); @info "lb = $lb"
Xt = LinearAlgebra.Symmetric(JuMP.value.(X))
valmin, t = my_eigmin(Xt)
@assert valmin >= -ϵ # double check the primal feasibility
ub = 2 * Xt[1, 2]
# ✅ here lb = ub = 0.0
# ✅ The LP relaxation method converges to global optimality

Remark: tuning ib is nontrivial, e.g. ib = 7.8 would be an improper choice, where we cannot leave for quickly.

There are 2 to-be-improved issues: smallest eigenval and Symmetric.

The above post is LP-relaxation to the SDP cut.
If we reformulate the SDP constraint into y^2 <= 0,
We can also approach the problem as follows

import JuMP, Gurobi
function optimise(model)
    JuMP.optimize!(model)
    JuMP.assert_is_solved_and_feasible(model; allow_local = false, dual = true)
end

# The [primal problem] is
# Min   2y
# s.t.  y^2 <= 0
# whose theoretical OPTIMAL value is 0.
# But if we use LP-relaxation to tackle it,
# we can only converge to a reluctantly feasible point `y = -0.000213623046875`
# with practical OPTIMAL value being about `-0.0004`

# 🍏 These 2 are hyperparameters
ib = 7 # initial artificial bound
COT = 1e-7 # `cut-off-tolerance`: a moderate value, Given Gurobi's behavior

model = JuMP.Model(Gurobi.Optimizer) # We employ this surrogate of the primal problem
JuMP.@variable(model, -ib <= y <= ib)
JuMP.@objective(model, Min, 2 * y)
while true
    optimise(model)
    yt = JuMP.value(y) # generate a trial point that might be infeasible to the primal problem
    if yt^2 >= 0 + COT # the cut to-be-add CAN STRONGLY cut off the current trial solution `yt`
        JuMP.@constraint(model, yt^2 + 2 * yt * (y - yt) <= 0) # attribute to the fundamental theorem of differentiable convex function
    else
        break
    end
end # 0 errors, 0 warnings
optimise(model)
yt = JuMP.value(y)
@assert yt^2 <= COT # constraint y^2 <= 0 holds WITH tolerance `COT`
@assert JuMP.dual(JuMP.UpperBoundRef(y)) == 0.0
@assert JuMP.dual(JuMP.LowerBoundRef(y)) == 0.0
# Since the above 2 lines passed, we might do the following 2 lines
JuMP.delete_upper_bound(y)
JuMP.delete_lower_bound(y)
optimise(model) # At this time, `model` is indeed an LP relaxation of the primal problem
yt = JuMP.value(y)
@assert yt^2 <= COT
lb = JuMP.objective_bound(model) # -0.00042724609375
ub = 2 * yt # -0.00042724609375
# global optimality is attained

This should be a final perspective on the troublesome SDP problem proposed by @odow

(primal problem) \quad \min_{x, y} \{2y | M \in \mathrm{PSD} \}, where M = [0 y; y x].

The difficulty of this problem lies in the unboundedness of the variable x and y.
When I employ the LP relaxation to tackle the primal problem, it appears that it is a feat to decide the initial artificial bound on x and y—the free variables. Because of this, I have to add a presolve loop (:blue_circle:).

The full code is as follows (this is a very mature (≈definitive) version)

import JuMP, LinearAlgebra
import Gurobi; GRB_ENV = Gurobi.Env()
function my_eigmin(A)
    valmin, vecmin = LinearAlgebra.eigen(A, 1:1)
    return first(valmin), vecmin[:, 1]
end
function optimise(model)
    JuMP.optimize!(model)
    JuMP.assert_is_solved_and_feasible(model; allow_local = false, dual = true)
end
CoT = 1e-7 # linear cut-off tolerance, set according to my experience of using Gurobi
function get_result_with(ib) # `ib` is an indispensable artificial initial bound
    model = JuMP.Model(() -> Gurobi.Optimizer(GRB_ENV)); JuMP.set_silent(model)
    JuMP.@variable(model, -ib <= y <= ib); JuMP.@variable(model, -ib <= x <= ib)
    JuMP.@expression(model, X, LinearAlgebra.Symmetric([0 y; y x]))
    JuMP.@objective(model, Min, 2 * y)
    while true
        optimise(model)
        Xt = LinearAlgebra.Symmetric(JuMP.value.(X)) # a trial solution that may be infeasible to the primal problem
        _, t = my_eigmin(Xt) # generate a linear cut coefficient at the current trial solution
        if t' * Xt * t <= -CoT # the current trial solution `Xt` can be strongly cut off
            JuMP.@constraint(model, t' * X * t >= 0) # add a linear cut, as a valid substituting cut for the PSDCut
        else # no more linear cuts can be added
            break
        end
    end
    yu = -JuMP.dual(JuMP.UpperBoundRef(y))
    xu = -JuMP.dual(JuMP.UpperBoundRef(x))
    yl =  JuMP.dual(JuMP.LowerBoundRef(y))
    xl =  JuMP.dual(JuMP.LowerBoundRef(x))
    aggregated_sensitivity = +(yu, xu, yl, xl)
    lb = JuMP.objective_bound(model)
    return aggregated_sensitivity, lb
end

# 🔵 The following loop is deemed a "presolve" procedure
for i in 1:17 # The aim is to find a good artificial initial bound
    as, lb = get_result_with(exp(i))
    @info "i = $i, sensitivity = $as, lb = $lb"
end # we observe that `exp(15)` might be a good artificial bound

ib = exp(15) # an artificial initial bound
model = JuMP.Model(Gurobi.Optimizer); JuMP.set_silent(model)
JuMP.@variable(model, -ib <= y <= ib); JuMP.@variable(model, -ib <= x <= ib)
JuMP.@expression(model, X, LinearAlgebra.Symmetric([0 y; y x]))
JuMP.@objective(model, Min, 2 * y)
while true
    optimise(model)
    Xt = LinearAlgebra.Symmetric(JuMP.value.(X))
    _, t = my_eigmin(Xt)
    (t' * Xt * t <= -CoT ? JuMP.@constraint(model, t' * X * t >= 0) : break)
end
optimise(model)
yu = -JuMP.dual(JuMP.UpperBoundRef(y))
xu = -JuMP.dual(JuMP.UpperBoundRef(x))
yl =  JuMP.dual(JuMP.LowerBoundRef(y))
xl =  JuMP.dual(JuMP.LowerBoundRef(x))
aggregated_sensitivity = +(yu, xu, yl, xl)
@assert aggregated_sensitivity == 0
JuMP.delete_lower_bound(x);JuMP.delete_lower_bound(y);
JuMP.delete_upper_bound(x);JuMP.delete_upper_bound(y);
optimise(model) # At this time, the `model` is indeed a valid LP relaxation of the primal SDP
@assert LinearAlgebra.norm(JuMP.value.(X)) == 0 # we succeed in finding the optimal solution of the primal SDP
@assert JuMP.objective_bound(model) == JuMP.objective_value(model) == 0

It is nontrivial to decide a good initial artificial bound, no wonder Mosek and Clarabel both fails on this problem.