# JuMP & SumOfSquares generate SDPs with ~18x more constraints than YALMIP

Hi, I am attempting to rewrite MATLAB code, which is slow due to Yalmip’s parsing speed.
The resulting problem that gets sent to the solver from JuMP is much larger (and slower to solve) than Yalmip’s output.
I don’t think I am using any “inefficient” functions for building the problem, but the differences are huge and I’d really appreciate help from someone more experienced.

The difference in size can look like this for the same problem with the same optimal value:
Matlab:

``````Optimizer  - Constraints            : 106
Optimizer  - Cones                  : 1
Optimizer  - Scalar variables       : 57
``````

Julia:

`````` Constraints            : 1887
Cones                  : 1
Scalar variables       : 108
``````

The codes for both are below. They can be scaled by manipulating the variables `n,m`
(the results above are for `n=10, m=50`).

The Julia code calls MATLAB in the beginning in order to
generate the same input data, this is just to show that we are indeed solving the same problem.

Matlab code:

``````%% Data preparation
clear all
close all

rng(3)

% State space dimension
n = 10;
% Number of neurons
m = 50;
% System
A = randn(n,n);
A = 0.7 * A / norm(A,2);
B = round(rand(n,2)); B = B / norm(B,2);
% Neural network
W1 = randn(m,n); W1 = W1 / norm(W1,2);
W2 = randn(2,m); W2 = W2 / norm(W2,2);
B = B*W2; % (no need to use B and W2 separately)
% Coordinate transformation x <- Qx
Q = randn(n,n);
A = inv(Q)*A*Q;
B = inv(Q)*B;
W1 = W1*Q;

%% Verification

% Degree of V
d = 2;

% Define Yalmip variables
H = sdpvar(n,n); % Matrix parametrizing V
x = sdpvar(n,1); % State
u = sdpvar(m,1); % Control

% Lyapunov candidate V
V = x'*H*x;

% Equality constraints u*(u-W1*x) == 0
h = u.*(u-W1*x);

% Polynomial multipliers for equality constraints
R = []; CR = [];
for i = 1:numel(h)
[r,cr] = polynomial([x;u], d - degree(h(i)));
CR =  [CR ; cr];
R  =  [R  ; r];
end

Vp = x'*A'*H*A*x + 2*u'*B'*H*A*x + u'*B'*H*B*u; % Equivalent to Vp = replace(V,x,A*x + B*u);
% Constraints
con = [ sos(V - Vp - x'*x  - R'*h), H >= 0 ];

% Solver options
options = sdpsettings('solver','mosek');

% Solve
sol = solvesos(con,norm(H(:),2)^2,options,[H(:);CR])

fprintf("objective = %6.6f\n",norm(double(H(:)),2)^2)
``````

Julia code:

``````using MosekTools
using JuMP
using DynamicPolynomials
using LinearAlgebra
using SumOfSquares
using MATLAB

# generate same data as MATLAB
mat"
clear all, close all
rng(3)
% State space dimension
n = 10;
% Number of neurons
m = 50;
% System
A = randn(n,n);
A = 0.7 * A / norm(A,2);
B = round(rand(n,2)); B = B / norm(B,2);
% Neural network
W1 = randn(m,n); W1 = W1 / norm(W1,2);
W2 = randn(2,m); W2 = W2 / norm(W2,2);
B = B*W2; % (no need to use B and W2 separately)
% Coordinate transformation x <- Qx
Q = randn(n,n);
A = inv(Q)*A*Q;
B = inv(Q)*B;
W1 = W1*Q;
\$Q = Q;
\$A = A;
\$B = B;
\$W1 = W1;
\$n = n;
\$m = m;
"
n = Int(n)
m = Int(m)
## Verification
d = 2; # degree of V
model = SOSModel(Mosek.Optimizer)

# Variables
@variable(model,H[1:n,1:n],PSD); # Matrix parametrizing V
@polyvar(x[1:n]) # State
@polyvar(u[1:m]) # Control

# Lyapunov candidate V
V = x'*H*x;

# constraints u.*(u-W1*x) == 0
h = u.*(u-W1*x);
R = Vector{Polynomial{true, VariableRef}}(undef,length(h))
for i = 1:length(h)
global R,h
ds = d - maxdegree(h[i])
R[i] = @variable(model,[1:1], Poly(monomials(x, 0:ds )))[1];
end

Vp = subs(V,x => A*x + B*u)
# Constraints
@constraint(model, V - Vp - x'*x - R'*h >= 0  ); # (V - Vp - x'*x) on set (h == 0)
@objective(model, Min, sum(H.^2)  ); # norm of V's Gram matrix

JuMP.optimize!(model)
@show objective_value(model)
``````

``````Julia 1.6.0
JuMP v0.21.8
SumOfSquares v0.4.6
DynamicPolynomials v0.3.18
MosekTools v0.9.4
MATLAB v0.8.2
``````

2 Likes

cc @blegat this seems like a question for you.

After some playing around I think the problem is due to the norm^2 in the objective function.
Here are some shorter codes for testing.
Julia:

``````using MosekTools
using JuMP
using DynamicPolynomials
using SumOfSquares
##

n = 100;
model = SOSModel(Mosek.Optimizer)
@variable(model,H[1:n,1:n],PSD); # Matrix parametrizing V
@polyvar(x[1:n])
V = x'*H*x;
@constraint(model, V - x'*x >= 0)

@objective(model, Min, sum(H[1:end].^2) ); # slower solve than Yalmip
# @objective(model, Min, sum(H[1:end]) ); # (slightly) faster solve than Yalmip
JuMP.optimize!(model)
@show objective_value(model)
``````

Matlab:

``````n = 100;
options = sdpsettings('solver','mosek','sos.newton',0,'sos.congruence',0); % turn off reductions
H = sdpvar(n,n); % Matrix parametrizing V
x = sdpvar(n,1); % State
V = x'*H*x;
con = [ sos(V - x'*x), H>=0 ];

sol = solvesos(con,norm(H(:),2)^2,options,[H(:)]);
% sol = solvesos(con,sum(H(:)),options,[H(:)])

``````

On my machine, Mosek takes `12s` to solve the problem from Yalmip and `37s` to solve the JuMP’s problem.

The codes include commented lines with linear objective, to show that the solving speeds (and problem sizes) are almost the same without the norm.

Is there a better way to write the norm^2?

Disregarding this particular optimization problem (I am not much into JuMP yet), you can use norm from LinearAlgebra package to compute the Frobenius norm of a matrix. It seems working faster than what you have right now.

``````julia> using BenchmarkTools

julia> n = 1000
1000

julia> H = rand(n,n)
1000×1000 Matrix{Float64}:
0.338794  0.752583   0.232137   0.120831   0.63908   0.922663  0.702443   0.311245  …  0.733689  0.505383  0.811657  0.476241  0.993609    0.327517
0.971248  0.505866   0.993239   0.286642   0.415582  0.48407   0.841185   0.112106     0.909769  0.170488  0.185902  0.356967  0.35784     0.0755087
0.700314  0.249442   0.43525    0.0272597  0.494095  0.494849  0.0474008  0.595768     0.18119   0.289972  0.170669  0.937315  0.634848    0.00903801
0.583128  0.685579   0.312781   0.114116   0.68168   0.86569   0.992571   0.845365     0.305631  0.543829  0.129549  0.826859  0.155827    0.663373
⋮                                                    ⋮                              ⋱            ⋮
0.885583  0.430756   0.101611   0.662383   0.835605  0.293109  0.0600033  0.296186     0.61711   0.777246  0.675922  0.130289  0.884445    0.594788
0.49874   0.428022   0.215291   0.715348   0.189634  0.468034  0.684738   0.362982     0.529678  0.558699  0.514717  0.867737  0.05932     0.89419
0.637278  0.795315   0.517079   0.452919   0.299841  0.21091   0.161022   0.389068     0.165848  0.968597  0.747196  0.280569  0.00660625  0.527883
0.907406  0.0814852  0.0709207  0.501657   0.782972  0.883467  0.432249   0.612925     0.480489  0.928168  0.273238  0.321604  0.0243705   0.345264

julia> @btime sum(H[:].^2)
1.805 ms (9 allocations: 15.26 MiB)
333357.85015906964

julia> @btime norm(H)^2
496.871 μs (2 allocations: 32 bytes)
333357.85015906964
``````

I guess the reason for this difference is that by using `H[:]`, you are creating a (huge) temporary vector (this is also visible in the output from the `@btime` macro (some 15 MB allocated on the heap). The `LinearAlgebra.norm` obviously avoids this.

I still feel there must be a more efficient way because computing the Frobenius norm of a matrix (or just two-norm of a vector) involves finding the square root. If you then square the result, finding the square root seems unnecessary.

You can also skip the `[:]` in your code. It helps, but still does some allocation and is not as fast as the solution based on `norm`.

``````@btime sum(H.^2)
1.204 ms (7 allocations: 7.63 MiB)
333357.85015906964
``````

Perhaps this:

``````@btime mapreduce(x->x^2, +, H)
305.525 μs (1 allocation: 16 bytes)
333357.85015906964
``````

… but well… I tried this trick for computing the sum of squres of the elements of a matrix with your simplified example and it does not change the timing at all. Apparently, what I have written above is relevant for constant matrices but not for these more complicated array types used in JuMP:

``````using MosekTools
using JuMP
using DynamicPolynomials
using SumOfSquares
##

n = 100;
model = SOSModel(Mosek.Optimizer)
@variable(model,H[1:n,1:n],PSD); # Matrix parametrizing V
@polyvar(x[1:n])
V = x'*H*x;
@constraint(model, V - x'*x >= 0)

#@objective(model, Min, sum(H[1:end].^2) ); # slower solve than Yalmip
#@objective(model, Min, sum(H[1:end]) ) # (slightly) faster solve than Yalmip
#@objective(model, Min, norm(H,2)^2 )
#@objective(model, Min, norm(H,2)) # do you really need to square the norm in the cost?
@objective(model, Min, mapreduce(x->x^2,+,H))
JuMP.optimize!(model)
@show objective_value(model)
``````

By the way, in your Matlab code

I would perhaps use the backslash instead of explicit computation of the inverse as in

``````A = Q\A*Q
``````

I am only writing this here since I guess that you may feel tempted to keep the same habit in Julia too.

Just to answer your comment `# do you really need to square the norm in the cost?`
Squaring the norm makes great difference for the solver.
In Yalmip, I get solution time `12s` with norm^2 and `76s` without squaring

I have also tried modelling the norm^2 according to the docs
https://jump.dev/MathOptInterface.jl/stable/reference/standard_form/#MathOptInterface.RotatedSecondOrderCone
as follows:

``````@variable(model,t )
@constraint(model, [t;1/2;H[:]] in RotatedSecondOrderCone())
@objective(model, Min, t )
``````

But that is even slower than doing `@objective(model, Min, sum(H[1:end].^2) )`

1 Like

What if you just use

``````@constraint(model, [t; vec(H)] in SecondOrderCone())
``````

Does that change how many variables are in the problem?

(Mosek doesn’t support quadratic objectives, JuMP is doing the RotatedSecondOrderCone transformation automatically, but something might be going wrong.)

I get the error
`LoadError: Cannot add MathOptInterface.SecondOrderCone constraint on a matrix variable`
It seems this error is thrown whenever `H` is defined either is `PSD` or `Symmetric` (by `@variable(model,H[1:n,1:n],PSD / Symmetric))`

When I created `H` as a general matrix and manually constrained it to be symmetric and PSD it worked, and so far it seems that on the smaller example that it gets very close to Yalmip’s problem performance (`13s` in Mosek).

Here is the code, it doesn’t make much sense to me to do the `lower.==upper` manually, but other more logical solutions (such as`@constraint(model, Symmetric(H) in PSDCone())`) threw the same error as above

``````using MosekTools
using JuMP
using DynamicPolynomials
using SumOfSquares
using LinearAlgebra
##

n = 100;
model = SOSModel(Mosek.Optimizer)
@variable(model,H[1:n,1:n]);
@polyvar(x[1:n])
V = x'*H*x;
@constraint(model, V - x'*x >= 0)

@variable(model,t)
# @constraint(model, [t;1/2;vec(H)] in RotatedSecondOrderCone())
@constraint(model, [t; vec(H)] in SecondOrderCone())
@objective(model, Min, t )
@constraint(model, H in PSDCone())
lower = H[LinearAlgebra.tril(trues(size(H)),-1)]
upper = H[LinearAlgebra.triu(trues(size(H)),+1)]
@constraint(model,lower .== upper)

JuMP.optimize!(model)
@show objective_value(model)
``````

This is because Mosek handles PSD variables differently. The default reformulation adds an extra `n^2` scalar variables, adds linear constraints to make them equal to `H`, and then adds the norm constraint on the scalar variables. That explains the different in the variable and constraint count.

Yalmip might be doing a better reformulation. But @blegat is the one to ask on that front.

1 Like

Ahh, this line is wrong, it should have been
`upper = H'[LinearAlgebra.tril(trues(size(H)),-1)]`.
The wrong one vectorized the variables in a different order than `lower` so the matrix wasn’t actually constrained to be symmetric.

With the fix, it takes around `24s` which is still an improvement though…

As discussed in YALMIP vs JuMP - #10 by blegat, YALMIP automatically dualizes the problem before giving it to Mosek. To have the same behavior with JuMP or SumOfSquares, use GitHub - jump-dev/Dualization.jl: Automatic dualization feature for MathOptInterface.jl and `model = Model(dual_optimizer(Mosek.Optimizer))`.

2 Likes

That actually made it slightly slower.
The Matlab script is the same

Julia script:

``````using MosekTools
using JuMP
using DynamicPolynomials
using SumOfSquares
using Dualization
##
n = 100;
# model = SOSModel(Mosek.Optimizer) # Time: 38.84
model = SOSModel(dual_optimizer(Mosek.Optimizer)) # Time: 42.02
@variable(model,H[1:n,1:n],PSD); # Matrix parametrizing V
@polyvar(x[1:n])
V = x'*H*x;
@constraint(model, V - x'*x >= 0)

@objective(model, Min, sum(H[1:end].^2) ); # slower solve than Yalmip
JuMP.optimize!(model)
@show objective_value(model)
@show solve_time(model)
``````

Mosek log with `SOSModel(dual_optimizer(Mosek.Optimizer))`:

``````Problem
Name                   :
Objective sense        : max
Type                   : CONIC (conic optimization problem)
Constraints            : 15152
Cones                  : 1
Scalar variables       : 15153
Matrix variables       : 2
Integer variables      : 0

Optimizer  - solved problem         : the primal
Optimizer  - Constraints            : 10100
Optimizer  - Cones                  : 2
Optimizer  - Scalar variables       : 10103             conic                  : 1010
Optimizer  - Semi-definite variables: 2                 scalarized             : 1010
Factor     - setup time             : 26.59             dense det. time        : 0.00
Factor     - ML order time          : 24.20             GP order time          : 0.00
Factor     - nonzeros before factor : 2.55e+07          after factor           : 4.15
Factor     - dense dim.             : 2                 flops                  : 2.07
``````

Mosek log with Yalmip:

``````Problem
Name                   :
Objective sense        : min
Type                   : CONIC (conic optimization problem)
Constraints            : 156
Cones                  : 1
Scalar variables       : 57
Matrix variables       : 2
Integer variables      : 0

Optimizer  - solved problem         : the primal
Optimizer  - Constraints            : 156
Optimizer  - Cones                  : 1
Optimizer  - Scalar variables       : 57                conic                  : 57
Optimizer  - Semi-definite variables: 2                 scalarized             : 6160
Factor     - setup time             : 0.03              dense det. time        : 0.00
Factor     - ML order time          : 0.00              GP order time          : 0.00
Factor     - nonzeros before factor : 1.21e+04          after factor           : 1.21e+04
Factor     - dense dim.             : 0                 flops                  : 2.41e+08
``````

I’m not sure what Yalmip does but it seems that it reduces the problem in some preprocessing step.
Here `H` is the matrix with minimal Frobenius norm such that all its eigenvalues are smaller than 1. The optimal solution is the identity.
Therefore it seems that it’s using something like a identity matrix `H`:

``````using MosekTools
using JuMP
using DynamicPolynomials
using SumOfSquares
using Dualization
n = 100;
model = SOSModel(dual_optimizer(Mosek.Optimizer))
@variable(model,h[1:n]); # Matrix parametrizing V
using LinearAlgebra
H = zeros(AffExpr, n, n)
for i in 1:n
H[i, i] = h[i]
end
@constraint(model, Symmetric(H - I) in PSDCone())
@variable(model, t)
@constraint(model, [t; h] in SecondOrderCone())

@objective(model, Min, t)
JuMP.optimize!(model)
@show objective_value(model)
@show solve_time(model)
``````

I get

``````Problem
Name                   :
Objective sense        : max
Type                   : CONIC (conic optimization problem)
Constraints            : 101
Cones                  : 1
Scalar variables       : 101
Matrix variables       : 1
Integer variables      : 0

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00
Lin. dep.  - tries                  : 1                 time                   : 0.00
Lin. dep.  - number                 : 0
Presolve terminated. Time: 0.00
Problem
Name                   :
Objective sense        : max
Type                   : CONIC (conic optimization problem)
Constraints            : 101
Cones                  : 1
Scalar variables       : 101
Matrix variables       : 1
Integer variables      : 0

Optimizer  - solved problem         : the primal
Optimizer  - Constraints            : 100
Optimizer  - Cones                  : 1
Optimizer  - Scalar variables       : 101               conic                  : 101
Optimizer  - Semi-definite variables: 1                 scalarized             : 5050
Factor     - setup time             : 0.00              dense det. time        : 0.00
Factor     - ML order time          : 0.00              GP order time          : 0.00
Factor     - nonzeros before factor : 5050              after factor           : 5050
Factor     - dense dim.             : 0                 flops                  : 4.34e+05
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME
0   1.0e+00  2.0e+00  9.8e+01  0.00e+00   1.000000000e+02   1.000000000e+00   1.0e+00  0.00
1   6.1e-01  1.2e+00  1.5e+02  -2.85e+00  1.841428571e+02   6.133724425e+00   6.1e-01  0.00
2   5.7e-01  1.1e+00  2.1e+01  2.70e+00   8.732344947e+01   7.858257767e+00   5.7e-01  0.01
3   4.4e-01  8.8e-01  2.5e+00  2.02e+00   3.641534896e+01   8.999653495e+00   4.4e-01  0.01
4   7.6e-02  1.5e-01  6.4e-01  2.68e+00   1.129674777e+01   9.711983042e+00   7.6e-02  0.01
5   2.5e-04  5.0e-04  9.1e-05  9.65e-01   1.000477716e+01   9.999478620e+00   2.5e-04  0.02
6   2.0e-08  2.1e-08  3.9e-15  1.00e+00   1.000000000e+01   1.000000000e+01   3.1e-11  0.02
Optimizer terminated. Time: 0.02

objective_value(model) = 9.999999995326595
solve_time(model) = 0.020170927047729492
0.020170927047729492
``````

Maybe this problem is too reduced and we need to focus on the one of your earlier post.
You need to try to stick to either standard conic form (equality constraints and conic variables) or geometric conic form (conic constraints and free variables).
Then, if you use the standard form and the solver uses geometric form (or vice versa), you need to use Dualization.
In the example above, I used 101 free variables (1 for `t` and 100 for `h`) so after dualization, it gives 101 equality constraints.
Then I have 1 LMI constraint which give 1 SDP variable after dualization (here the matrices are diagonal so it could be much simpler of course).
Then you have one constraint `[t; h] in SOC`. This could be interpreted as the variables `[t; h]` belong to the SOC cone in the standard form but as Mosek expect a standard form, Dualization should understand it in the geometric form so this is understood as `t` and `h` being free variables so in the dual we have a conic SOC variable of length 101 (which gives 1 cone and 101 variables).

3 Likes

I checked the code in your top post. I detailed the issue in LMI bridge · Issue #205 · jump-dev/SumOfSquares.jl · GitHub.
In short, SumOfSquares uses the standard conic form for SumOfSquares constraints.
In your case, you should use the geometric conic form as discussed in LMI bridge · Issue #205 · jump-dev/SumOfSquares.jl · GitHub.
As your polynomials are quadratic, you can in fact use JuMP directly, e.g.,

``````model = Model(dual_optimizer(Mosek.Optimizer))

# Variables
@variable(model,H[1:n,1:n],PSD); # Matrix parametrizing V
@polyvar(x[1:n]) # State
@polyvar(u[1:m]) # Control

# Lyapunov candidate V
V = x'*H*x;

# constraints u.*(u-W1*x) == 0
h = u .* (u - W1 * x);

Vp = subs(V, x => A*x + B*u)

# Constraints

using LinearAlgebra
function matrix(poly)
M = Matrix{AffExpr}(undef, n + m, n + m)
v = [x; u]
for i in 1:(n + m)
for j in 1:(n + m)
coef = DynamicPolynomials.coefficient(poly, v[i] * v[j])
if i == j
M[i, j] = coef
else
M[j, i] = M[i, j] = coef / 2
end
end
end
return Symmetric(M)
end

# (V - Vp - x'*x) on set (h == 0)
@variable(model, R[1:m])
@constraint(model, matrix(V - Vp - x'*x - sum(R[i] * h[i] for i in 1:m)) in PSDCone());

@variable(model, t)
@constraint(model, [t; vec(H)] in SecondOrderCone())
@objective(model, Min, t)

JuMP.optimize!(model)
@show termination_status(model)
@show objective_value(model)
``````
5 Likes

Yes, this solves it!
Thanks!

1 Like