SCS and Mosek give very different answers

Hi,

I’m looking for help with semidefinite programming libraries, in particular Mosek.jl and SCS.jl, which I am accessing via JuMP.

I have a particular semidefinite program, unfortunately not so small, for which SCS and Mosek seem to have wildly different behavior. The JuMP code to create the SDP is below. The objective function of the SDP involves maximizing a sum of 20 variables, all of which are constrained to be at most 1, so the objective value is certainly at most 20. SCS seems to be unstable on this SDP, as different runs of SCS give pretty different objective values. However, it reliably tells me it has found optimal feasible solutions with objective values in the 100s.

Mosek, by contrast, reliably halts with the SLOW_PROGRESS error with unknown feasibility status, and objective value around 5. I suspect this is close to the correct optimal value.

Can anyone help me diagnose what is going on here? Perhaps there is a better way to write this SDP so SCS/Mosek can reliably solve it? Also let me know if I should be posting on some SCS forum instead of the julia forums.

Thank you!
Sam Hopkins (MIT EECS/CSAIL)

n = 20
X1 = [-1.8720260374246223, -1.8781458774724729, -0.17791029084388438, -0.8462955398955371, -2.0469710096036433, -0.35975721626597695, 0.8842577434655674, -1.3223561839837161, 0.6896142719006527, -0.36871492095552355, 2.5457019309298645, 0.43255645627695055, 1.1829341730339282, 0.27928496225015176, -0.7190127944803778, -0.3181873711543086, -0.777522351529572, -0.7510533778023913, 1.092447119866926, -0.7166669366612253]
X2 = [1.1070459415658573, 0.5795843324866427, -0.5863786589759239, -0.6303514730508921, -2.134220474514286, -2.1073153136370113, -1.738225867184788, -0.6228120766649945, 0.3489472785183505, -0.007112356145337337, -0.3095732980138016, 0.6191937516367954, -1.1757497744128709, 0.489785023278765, -0.6099680522048614, 1.5311815120096015, 1.3767360330748495, 0.36292134299401857, -0.38201359174032207, 1.1159100799295802]
y = [1.4482646171826903, -0.13640237028718938, -1.935472303485774, -2.7384350612655957, -8.456692621719586, -6.67386946936305, -4.320712475168053, -3.208524514003925, 1.7419313772371354, -0.3947635292123156, 1.6165609185591465, 2.296956225988994, -2.3443423784949617, 1.7517382625370639, -2.5502531514767033, 4.282043983846345, 3.3633475815066287, 0.32836024839909755, -0.055302855197776786, 2.624222775581259]


y = [X1[i] + 3*X2[i] + 0.01*randn() for i in 1:n]

model = Model(SCS.Optimizer)

@variable(model, 1 >= W[1:n] >= 0)
@variable(model, 1 >= W2[1:n,1:n] >= 0)
@variable(model, beta)
@variable(model, beta2)
@variable(model, beta3)
@variable(model, beta4)
@variable(model, beta5)
@variable(model, beta6)
@variable(model, beta7)
@variable(model, beta8)
@variable(model, Wbeta[1:n])
@variable(model, Wbeta2[1:n] >= 0)
@variable(model, Wbeta3[1:n])
@variable(model, Wbeta4[1:n] >= 0)

# W2 is symmetric

for i in 1:n
    for j in 1:n
        @constraint(model, W2[i,j] == W2[j,i])
    end
end

# diagonals of W2 match W

for i in 1:n
    @constraint(model, W2[i,i] == W[i])
end

# lifted positivity constraints corresponding to W[i] is a square

for i in 1:n
    @constraint(model, [1 W[i] Wbeta[i] Wbeta2[i]; W[i] W[i] Wbeta[i] Wbeta2[i]; Wbeta[i] Wbeta[i] Wbeta2[i] Wbeta3[i]; Wbeta2[i] Wbeta2[i] Wbeta3[i] Wbeta4[i]] in PSDCone())
end

# wi * beta^2 <= beta^2

for i in 1:n
    @constraint(model, Wbeta2[i] <= beta2)
end
        


# (n+2) x (n+2) matrix is psd

@constraint(model, [1 beta beta2 beta3 beta4 W'; beta beta2 beta3 beta4 beta5 Wbeta'; beta2 beta3 beta4 beta5 beta6 Wbeta2' ; beta3 beta4 beta5 beta6 beta7 Wbeta3' ; beta4 beta5 beta6 beta7 beta8 Wbeta4' ; W Wbeta Wbeta2 Wbeta3 Wbeta4 W2] in PSDCone())

# define residuals for beta regression, assumes that beta2 = 0
weighted_residuals = [X1[i] * Wbeta[i] - y[i] * W2[i,i] for i in 1:n]

# define weighted residuals lifted by beta
weighted_residuals_beta = [X1[i] * Wbeta2[i] - y[i] * Wbeta[i] for i in 1:n]


# constrain to have gradient zero
@constraint(model, sum([X1[i] * weighted_residuals[i] for i in 1:n]) == 0)
@constraint(model, sum([X2[i] * weighted_residuals[i] for i in 1:n]) == 0)
@constraint(model, sum([X1[i] * weighted_residuals_beta[i] for i in 1:n]) == 0)
@constraint(model, sum([X2[i] * weighted_residuals_beta[i] for i in 1:n]) == 0)


# maximize the number of selected samples
@objective(model, Max, sum([W2[i,i] for i in 1:n]))

optimize!(model)
println("Optimal objective value: ", objective_value(model))
1 Like

Hi @Sam_Hopkins

You should be very suspicious of SDPs which Mosek struggles to solve. This looks like a bug in SCS. It returns a solution with a very large primal residual, but claims optimality. If you look at the solution, the W value are not in [0, 1].

julia> optimize!(model)
------------------------------------------------------------------
	       SCS v3.2.4 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 508, constraints m: 2039
cones: 	  z: primal zero / dual free vars: 614
	  l: linear vars: 900
	  s: psd vars: 525, ssize: 21
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, rho_x: 1.00e-06
	  acceleration_lookback: 10, acceleration_interval: 10
	  compiled with openmp parallelization enabled
lin-sys:  sparse-direct-amd-qdldl
	  nnz(A): 2764, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 3.03e+00  6.10e+00  2.81e+01 -1.49e+01  1.00e-01  2.38e-03 
   250| 3.37e-02  2.52e-03  1.92e-01 -4.47e+00  1.00e-01  5.19e-02 
   500| 3.01e-02  1.76e-03  1.42e-01 -4.30e+00  1.00e-01  1.02e-01 
   750| 3.07e-02  1.45e-03  1.17e-01 -4.21e+00  1.00e-01  1.51e-01 
  1000| 2.70e-02  1.31e-03  1.04e-01 -4.14e+00  1.00e-01  1.99e-01 
  1250| 2.29e-02  1.24e-03  9.01e-02 -4.08e+00  1.00e-01  2.47e-01 
  1500| 1.93e-02  1.19e-03  7.68e-02 -4.04e+00  1.00e-01  2.94e-01 
  1750| 1.62e-02  1.14e-03  6.66e-02 -4.00e+00  1.00e-01  3.42e-01 
  2000| 1.89e-02  1.08e-03  1.29e-03 -3.96e+00  1.80e-03  3.92e-01 
  2250| 2.84e-01  1.08e-03  8.82e-04 -3.96e+00  1.29e-04  4.39e-01 
  2500| 1.51e+00  1.06e-03  1.72e-02 -4.08e+00  4.05e-05  4.87e-01 
  2750| 5.68e+00  1.02e-03  6.56e-02 -5.70e+00  4.05e-05  5.35e-01 
  3000| 6.69e+00  1.01e-03  6.64e-02 -8.01e+00  4.05e-05  5.88e-01 
  3250| 5.90e+02  1.77e-02  6.17e-02 -1.04e+03  1.00e-06  6.36e-01 
  3500| 1.24e+03  4.24e-03  1.05e-01 -9.61e+01  1.00e-06  6.84e-01 
  3750| 9.01e+02  3.08e-03  7.51e-02 -1.16e+02  1.00e-06  7.32e-01 
  4000| 8.55e+02  3.77e-03  8.21e-02 -1.39e+02  1.00e-06  7.80e-01 
  4250| 9.07e+02  4.23e-03  9.32e-02 -1.70e+02  1.00e-06  8.28e-01 
  4500| 1.00e+03  4.70e-03  1.06e-01 -2.13e+02  1.00e-06  8.78e-01 
  4750| 1.16e+03  5.38e-03  1.25e-01 -2.74e+02  1.00e-06  9.28e-01 
  5000| 1.42e+03  6.53e-03  1.60e-01 -3.73e+02  1.00e-06  9.78e-01 
  5250| 2.00e+03  4.67e-03  1.07e-01 -1.21e+02  1.00e-06  1.03e+00 
  5500| 9.81e+15  2.05e+10  4.02e+15 -3.13e+15  1.00e-06  1.07e+00 
  5750| 4.85e+15  1.39e+10  5.00e+15 -3.64e+15  1.00e-06  1.12e+00 
  6000| 2.84e+15  1.35e+10  5.20e+15 -3.75e+15  1.00e-06  1.17e+00 
  6250| 2.39e+16  6.01e+10  4.43e+15 -3.72e+15  1.00e-06  1.22e+00 
  6500| 6.36e+15  1.18e+10  2.00e+15 -2.52e+15  1.00e-06  1.27e+00 
  6750| 4.51e+15  3.59e+09  1.79e+15 -2.42e+15  1.00e-06  1.31e+00 
  7000| 1.35e+05  8.38e-01  4.97e+00 -5.33e+04  1.00e-06  1.36e+00 
  7250| 1.74e+15  1.47e+09  1.10e+15 -2.17e+15  1.00e-06  1.41e+00 
  7500| 1.45e+15  1.51e+09  1.16e+15 -2.21e+15  1.00e-06  1.45e+00 
  7750| 1.32e+15  9.99e+08  1.16e+15 -2.21e+15  1.00e-06  1.50e+00 
  8000| 1.26e+15  9.87e+08  1.10e+15 -2.19e+15  1.00e-06  1.55e+00 
  8250| 1.18e+15  9.71e+08  1.04e+15 -2.17e+15  1.00e-06  1.60e+00 
  8500| 1.10e+15  9.51e+08  1.04e+15 -2.17e+15  1.00e-06  1.64e+00 
  8750| 1.05e+15  9.28e+08  9.75e+14 -2.15e+15  1.00e-06  1.69e+00 
  9000| 1.02e+15  9.05e+08  9.13e+14 -2.12e+15  1.00e-06  1.74e+00 
  9250| 9.95e+14  8.82e+08  8.24e+14 -2.08e+15  1.00e-06  1.78e+00 
  9500| 7.01e+04  1.03e+00  8.99e+04  4.42e+04  1.00e-06  1.83e+00 
  9750| 9.47e+14  8.38e+08  6.86e+14 -2.03e+15  1.00e-06  1.88e+00 
 10000| 9.25e+14  8.18e+08  6.32e+14 -2.01e+15  1.00e-06  1.93e+00 
 10250| 9.04e+14  7.98e+08  5.62e+14 -1.98e+15  1.00e-06  1.97e+00 
 10500| 8.92e+14  7.79e+08  5.43e+14 -1.97e+15  1.00e-06  2.02e+00 
 10750| 8.80e+14  7.61e+08  4.57e+14 -1.94e+15  1.00e-06  2.07e+00 
 11000| 8.68e+14  7.44e+08  4.37e+14 -1.93e+15  1.00e-06  2.11e+00 
 11250| 8.56e+14  7.27e+08  3.57e+14 -1.90e+15  1.00e-06  2.16e+00 
 11500| 8.45e+14  7.11e+08  3.57e+14 -1.90e+15  1.00e-06  2.21e+00 
 11750| 8.34e+14  6.96e+08  3.23e+14 -1.89e+15  1.00e-06  2.26e+00 
 12000| 8.26e+14  6.81e+08  2.92e+14 -1.88e+15  1.00e-06  2.30e+00 
 12250| 2.85e+18  3.79e+12  4.33e+17 -2.18e+17  1.00e-06  2.35e+00 
 12500| 8.05e+14  6.54e+08  1.96e+14 -1.84e+15  1.00e-06  2.40e+00 
 12750| 7.94e+14  6.42e+08  2.10e+14 -1.85e+15  1.00e-06  2.44e+00 
 13000| 7.85e+14  6.30e+08  9.82e+13 -1.80e+15  1.00e-06  2.49e+00 
 13250| 7.74e+14  6.18e+08  2.12e+13 -1.77e+15  1.00e-06  2.54e+00 
 13500| 7.65e+14  6.07e+08  8.36e+13 -1.72e+15  1.00e-06  2.58e+00 
 13750| 7.55e+14  5.96e+08  7.70e+13 -1.73e+15  1.00e-06  2.63e+00 
 14000| 3.87e+05  3.03e-01  4.06e+03 -9.15e+05  1.00e-06  2.68e+00 
 14250| 9.83e+04  7.66e-02  1.69e+02 -2.36e+05  1.00e-06  2.73e+00 
 14500| 3.25e+04  2.52e-02  1.06e+01 -7.93e+04  1.00e-06  2.77e+00 
 14750| 1.54e+04  1.19e-02  7.81e+00 -3.82e+04  1.00e-06  2.82e+00 
 15000| 2.52e+05  6.60e-01  2.24e+02 -2.29e+02  1.00e-06  2.87e+00 
 15250| 5.79e+03  4.41e-03  2.82e-01 -1.47e+04  1.00e-06  2.91e+00 
 15500| 4.05e+03  3.06e-03  7.13e-02 -1.04e+04  1.00e-06  2.96e+00 
 15750| 2.99e+03  2.25e-03  3.98e-02 -7.81e+03  1.00e-06  3.01e+00 
 16000| 2.30e+03  1.72e-03  4.76e-02 -6.08e+03  1.00e-06  3.06e+00 
 16250| 1.82e+03  1.40e-03  1.30e-01 -4.88e+03  1.00e-06  3.10e+00 
 16500| 1.48e+03  1.17e-03  5.81e-02 -4.03e+03  1.00e-06  3.15e+00 
 16750| 1.23e+03  9.99e-04  1.43e-01 -3.39e+03  1.00e-06  3.20e+00 
 17000| 1.04e+03  8.63e-04  6.04e-02 -2.89e+03  1.00e-06  3.25e+00 
 17250| 8.84e+02  7.54e-04  5.80e-02 -2.50e+03  1.00e-06  3.29e+00 
 17500| 7.63e+02  6.65e-04  6.13e-02 -2.19e+03  1.00e-06  3.34e+00 
 17750| 8.39e+02  1.06e-03  7.23e-02 -2.45e+03  1.00e-06  3.39e+00 
 18000| 7.16e+02  6.63e-04  6.27e-02 -2.11e+03  1.00e-06  3.44e+00 
 18250| 6.19e+02  5.86e-04  5.82e-02 -1.86e+03  1.00e-06  3.48e+00 
 18500| 5.41e+02  5.27e-04  4.54e-02 -1.66e+03  1.00e-06  3.53e+00 
 18750| 4.84e+02  4.81e-04  3.89e-02 -1.51e+03  1.00e-06  3.58e+00 
 19000| 4.37e+02  4.41e-04  4.37e-02 -1.38e+03  1.00e-06  3.63e+00 
 19250| 3.96e+02  4.05e-04  4.69e-02 -1.27e+03  1.00e-06  3.67e+00 
 19500| 3.60e+02  3.74e-04  4.41e-02 -1.17e+03  1.00e-06  3.72e+00 
 19750| 3.29e+02  3.46e-04  4.27e-02 -1.08e+03  1.00e-06  3.77e+00 
 20000| 4.21e+05  1.73e+00  1.54e+06 -7.69e+05  1.00e-06  3.82e+00 
 20250| 2.77e+02  2.99e-04  3.67e-02 -9.29e+02  1.00e-06  3.86e+00 
 20500| 2.55e+02  2.78e-04  3.46e-02 -8.67e+02  1.00e-06  3.91e+00 
 20750| 2.36e+02  2.60e-04  3.32e-02 -8.11e+02  1.00e-06  3.96e+00 
 21000| 2.19e+02  2.44e-04  2.90e-02 -7.61e+02  1.00e-06  4.00e+00 
 21250| 2.22e+02  1.75e-03  2.45e-02 -3.18e+02  1.00e-06  4.05e+00 
 21275| 2.28e+02  1.25e-04  2.49e-02 -3.17e+02  1.00e-06  4.06e+00 
------------------------------------------------------------------
status:  solved
timings: total: 4.06e+00s = setup: 1.94e-03s + solve: 4.05e+00s
	 lin-sys: 8.40e-01s, cones: 2.96e+00s, accel: 4.01e-02s
------------------------------------------------------------------
objective = -316.771726
------------------------------------------------------------------

julia> println("Optimal objective value: ", objective_value(model))
Optimal objective value: 316.7592859356823

julia> value.(W)
20-element Vector{Float64}:
 11.056837195150655
 14.412141673870124
 16.069320560678204
 18.812683697275116
 19.446560294277234
 10.043538437946458
 14.21919776045862
 20.624339310939654
 16.8699813991284
 19.91313353622495
 18.30335451413605
 22.548831159749465
  9.000498043918071
 20.319772433946493
 21.246420762427164
 12.55775609315935
 10.907945851054128
 13.760733220810389
 14.033249782461809
 10.705447564328303

julia> solution_summary(model)
* Solver : SCS

* Status
  Result count       : 1
  Termination status : OPTIMAL
  Message from the solver:
  "solved"

* Candidate solution (result #1)
  Primal status      : FEASIBLE_POINT
  Dual status        : FEASIBLE_POINT
  Objective value    : 3.16759e+02
  Dual objective value : 3.16784e+02

* Work counters
  Solve time (sec)   : 4.05678e+00

Try Clarabel.jl instead:

import Clarabel
model = Model(Clarabel.Optimizer)

It got me a solution that seems much more reasonable:

julia> solution_summary(model)
* Solver : Clarabel

* Status
  Result count       : 1
  Termination status : OPTIMAL
  Message from the solver:
  "SOLVED"

* Candidate solution (result #1)
  Primal status      : FEASIBLE_POINT
  Dual status        : FEASIBLE_POINT
  Objective value    : 7.43289e+00
  Dual objective value : 7.43289e+00

* Work counters
  Solve time (sec)   : 1.15616e+00
  Barrier iterations : 90
3 Likes

Thanks, that’s very helpful. I replicated this using Clarabel. A couple follow-up questions:

  1. Should I be reporting the SCS bug somewhere?

  2. What do you mean about “be suspicious of SDPs where Mosek struggles to solve”? Do you mean I should be suspicious of the solutions that Mosek is producing for such SDPs? Or more generally the SDP itself should be reformulated to be more amenable to a solver? Or something else?

I don’t think it’s obvious that this is a bug in SCS. It’s returned a solution that it determined satisfies the KKT conditions for your problem according to its own internal tolerances. There’s no particular reason to think that that hasn’t worked, it’s just that those tolerances are not very tight relative to the interior point solvers you are comparing against.

That said, I tried this problem with Mosek, SCS, Clarabel and COSMO and got 4 different answers for the optimal objective value (NB: you must use a consistent seed value in the random number generation to make this comparison, but they still come out different).

Worth noting also that the norms of both the primal and dual optimizers are really big : ~5e6 for the primal and ~3e5 for the dual (in Clarabel at least). Maybe therefore not that surprising that the different results for the objective value vary a bit between solvers.

2 Likes

The SCS primal residual is 2e+02 and solution violates bounds of [0, 1] to return 10. That seems like more than a minor tolerance issue? Or perhaps i don’t fully understand SCS tolerances.

I had a brief look at the SCS code, and I think (?) it is printing the norm of the primal residual there, and not some normalization of the residual. The convergence test itself is in part relative to the norm of the problem data and matrix-vector products appearing in the current iterate. See here for the relevant bit of code, or here for the docs.

I would guess that what is happening is that the solver has a large primal residual, but a normalised residual that is below the convergence threshold because the primal iterate itself is large. If I set eps_rel to 0.0 for SCS then it doesn’t report convergence because it can’t achieve its absolute tolerance condition.

3 Likes

I understood that to just mean that Mosek (as a commercial product) is broadly-speaking considered a reliable solver and if it’s struggling then this may be indicative of a difficult or ill-conditioned SDP. (Of course that’s not to say many open source solvers aren’t reliable depending on the scale and nature of the problem structure.)

1 Like