JuMP & SumOfSquares generate SDPs with ~18x more constraints than YALMIP

Hi, I am attempting to rewrite MATLAB code, which is slow due to Yalmip’s parsing speed.
The resulting problem that gets sent to the solver from JuMP is much larger (and slower to solve) than Yalmip’s output.
I don’t think I am using any “inefficient” functions for building the problem, but the differences are huge and I’d really appreciate help from someone more experienced.

The difference in size can look like this for the same problem with the same optimal value:
Matlab:

Optimizer  - Constraints            : 106
Optimizer  - Cones                  : 1
Optimizer  - Scalar variables       : 57 

Julia:

 Constraints            : 1887
  Cones                  : 1
  Scalar variables       : 108

The codes for both are below. They can be scaled by manipulating the variables n,m
(the results above are for n=10, m=50).

The Julia code calls MATLAB in the beginning in order to
generate the same input data, this is just to show that we are indeed solving the same problem.

Matlab code:

%% Data preparation
clear all
close all

rng(3)

% State space dimension
n = 10;
% Number of neurons
m = 50;
% System
A = randn(n,n);
A = 0.7 * A / norm(A,2);
B = round(rand(n,2)); B = B / norm(B,2);
% Neural network 
W1 = randn(m,n); W1 = W1 / norm(W1,2);
W2 = randn(2,m); W2 = W2 / norm(W2,2);
B = B*W2; % (no need to use B and W2 separately)
% Coordinate transformation x <- Qx
Q = randn(n,n);
A = inv(Q)*A*Q;
B = inv(Q)*B;
W1 = W1*Q;

%% Verification

% Degree of V
d = 2;

% Define Yalmip variables
H = sdpvar(n,n); % Matrix parametrizing V
x = sdpvar(n,1); % State
u = sdpvar(m,1); % Control

% Lyapunov candidate V
V = x'*H*x;

% Equality constraints u*(u-W1*x) == 0
h = u.*(u-W1*x);

% Polynomial multipliers for equality constraints
R = []; CR = [];
for i = 1:numel(h)
    [r,cr] = polynomial([x;u], d - degree(h(i)));
    CR =  [CR ; cr];
    R  =  [R  ; r];
end

Vp = x'*A'*H*A*x + 2*u'*B'*H*A*x + u'*B'*H*B*u; % Equivalent to Vp = replace(V,x,A*x + B*u);
% Constraints
con = [ sos(V - Vp - x'*x  - R'*h), H >= 0 ];

% Solver options
options = sdpsettings('solver','mosek');

% Solve
sol = solvesos(con,norm(H(:),2)^2,options,[H(:);CR])

fprintf("objective = %6.6f\n",norm(double(H(:)),2)^2)

Julia code:

using MosekTools
using JuMP
using DynamicPolynomials
using LinearAlgebra
using SumOfSquares
using MATLAB

# generate same data as MATLAB 
mat"
clear all, close all
rng(3)
% State space dimension
n = 10;
% Number of neurons
m = 50;
% System
A = randn(n,n);
A = 0.7 * A / norm(A,2);
B = round(rand(n,2)); B = B / norm(B,2);
% Neural network 
W1 = randn(m,n); W1 = W1 / norm(W1,2);
W2 = randn(2,m); W2 = W2 / norm(W2,2);
B = B*W2; % (no need to use B and W2 separately)
% Coordinate transformation x <- Qx
Q = randn(n,n);
A = inv(Q)*A*Q;
B = inv(Q)*B;
W1 = W1*Q;
$Q = Q;
$A = A;
$B = B;
$W1 = W1;
$n = n;
$m = m;
"
n = Int(n)
m = Int(m)
## Verification
d = 2; # degree of V
model = SOSModel(Mosek.Optimizer)

# Variables 
@variable(model,H[1:n,1:n],PSD); # Matrix parametrizing V
@polyvar(x[1:n]) # State
@polyvar(u[1:m]) # Control

# Lyapunov candidate V
V = x'*H*x;

# constraints u.*(u-W1*x) == 0
h = u.*(u-W1*x);
R = Vector{Polynomial{true, VariableRef}}(undef,length(h))
for i = 1:length(h)
    global R,h
    ds = d - maxdegree(h[i])
    R[i] = @variable(model,[1:1], Poly(monomials(x, 0:ds )))[1];
end

Vp = subs(V,x => A*x + B*u)
# Constraints
@constraint(model, V - Vp - x'*x - R'*h >= 0  ); # (V - Vp - x'*x) on set (h == 0)
@objective(model, Min, sum(H.^2)  ); # norm of V's Gram matrix

JuMP.optimize!(model)
@show objective_value(model)

Info about my instalation/package versions:

Julia 1.6.0
JuMP v0.21.8
SumOfSquares v0.4.6
DynamicPolynomials v0.3.18
MosekTools v0.9.4
MATLAB v0.8.2

Thanks in advance.

2 Likes

cc @blegat this seems like a question for you.

After some playing around I think the problem is due to the norm^2 in the objective function.
Here are some shorter codes for testing.
Julia:

using MosekTools
using JuMP
using DynamicPolynomials
using SumOfSquares
##

n = 100;
model = SOSModel(Mosek.Optimizer)
@variable(model,H[1:n,1:n],PSD); # Matrix parametrizing V
@polyvar(x[1:n]) 
V = x'*H*x;
@constraint(model, V - x'*x >= 0)

@objective(model, Min, sum(H[1:end].^2) ); # slower solve than Yalmip
# @objective(model, Min, sum(H[1:end]) ); # (slightly) faster solve than Yalmip
JuMP.optimize!(model)
@show objective_value(model)

Matlab:

n = 100;
options = sdpsettings('solver','mosek','sos.newton',0,'sos.congruence',0); % turn off reductions
H = sdpvar(n,n); % Matrix parametrizing V
x = sdpvar(n,1); % State
V = x'*H*x;
con = [ sos(V - x'*x), H>=0 ];

sol = solvesos(con,norm(H(:),2)^2,options,[H(:)]); 
% sol = solvesos(con,sum(H(:)),options,[H(:)]) 

On my machine, Mosek takes 12s to solve the problem from Yalmip and 37s to solve the JuMP’s problem.

The codes include commented lines with linear objective, to show that the solving speeds (and problem sizes) are almost the same without the norm.

Is there a better way to write the norm^2?

Disregarding this particular optimization problem (I am not much into JuMP yet), you can use norm from LinearAlgebra package to compute the Frobenius norm of a matrix. It seems working faster than what you have right now.

julia> using BenchmarkTools

julia> n = 1000
1000

julia> H = rand(n,n)
1000×1000 Matrix{Float64}:
 0.338794  0.752583   0.232137   0.120831   0.63908   0.922663  0.702443   0.311245  …  0.733689  0.505383  0.811657  0.476241  0.993609    0.327517
 0.971248  0.505866   0.993239   0.286642   0.415582  0.48407   0.841185   0.112106     0.909769  0.170488  0.185902  0.356967  0.35784     0.0755087
 0.700314  0.249442   0.43525    0.0272597  0.494095  0.494849  0.0474008  0.595768     0.18119   0.289972  0.170669  0.937315  0.634848    0.00903801
 0.583128  0.685579   0.312781   0.114116   0.68168   0.86569   0.992571   0.845365     0.305631  0.543829  0.129549  0.826859  0.155827    0.663373
 ⋮                                                    ⋮                              ⋱            ⋮                                         
 0.885583  0.430756   0.101611   0.662383   0.835605  0.293109  0.0600033  0.296186     0.61711   0.777246  0.675922  0.130289  0.884445    0.594788
 0.49874   0.428022   0.215291   0.715348   0.189634  0.468034  0.684738   0.362982     0.529678  0.558699  0.514717  0.867737  0.05932     0.89419
 0.637278  0.795315   0.517079   0.452919   0.299841  0.21091   0.161022   0.389068     0.165848  0.968597  0.747196  0.280569  0.00660625  0.527883
 0.907406  0.0814852  0.0709207  0.501657   0.782972  0.883467  0.432249   0.612925     0.480489  0.928168  0.273238  0.321604  0.0243705   0.345264

julia> @btime sum(H[:].^2)
  1.805 ms (9 allocations: 15.26 MiB)
333357.85015906964

julia> @btime norm(H)^2
  496.871 μs (2 allocations: 32 bytes)
333357.85015906964

I guess the reason for this difference is that by using H[:], you are creating a (huge) temporary vector (this is also visible in the output from the @btime macro (some 15 MB allocated on the heap). The LinearAlgebra.norm obviously avoids this.

I still feel there must be a more efficient way because computing the Frobenius norm of a matrix (or just two-norm of a vector) involves finding the square root. If you then square the result, finding the square root seems unnecessary.

You can also skip the [:] in your code. It helps, but still does some allocation and is not as fast as the solution based on norm.

@btime sum(H.^2)
  1.204 ms (7 allocations: 7.63 MiB)
333357.85015906964

Perhaps this:

@btime mapreduce(x->x^2, +, H)
  305.525 μs (1 allocation: 16 bytes)
333357.85015906964

… but well… I tried this trick for computing the sum of squres of the elements of a matrix with your simplified example and it does not change the timing at all. Apparently, what I have written above is relevant for constant matrices but not for these more complicated array types used in JuMP:

using MosekTools
using JuMP
using DynamicPolynomials
using SumOfSquares
##

n = 100;
model = SOSModel(Mosek.Optimizer)
@variable(model,H[1:n,1:n],PSD); # Matrix parametrizing V
@polyvar(x[1:n])
V = x'*H*x;
@constraint(model, V - x'*x >= 0)

#@objective(model, Min, sum(H[1:end].^2) ); # slower solve than Yalmip
#@objective(model, Min, sum(H[1:end]) ) # (slightly) faster solve than Yalmip
#@objective(model, Min, norm(H,2)^2 )
#@objective(model, Min, norm(H,2)) # do you really need to square the norm in the cost?
@objective(model, Min, mapreduce(x->x^2,+,H))
JuMP.optimize!(model)
@show objective_value(model)

By the way, in your Matlab code

I would perhaps use the backslash instead of explicit computation of the inverse as in

A = Q\A*Q

I am only writing this here since I guess that you may feel tempted to keep the same habit in Julia too.

Just to answer your comment # do you really need to square the norm in the cost?
Squaring the norm makes great difference for the solver.
In Yalmip, I get solution time 12s with norm^2 and 76s without squaring

I have also tried modelling the norm^2 according to the docs
https://jump.dev/MathOptInterface.jl/stable/reference/standard_form/#MathOptInterface.RotatedSecondOrderCone
as follows:

@variable(model,t )
@constraint(model, [t;1/2;H[:]] in RotatedSecondOrderCone())
@objective(model, Min, t )

But that is even slower than doing @objective(model, Min, sum(H[1:end].^2) )

1 Like

What if you just use

@constraint(model, [t; vec(H)] in SecondOrderCone())

Does that change how many variables are in the problem?

(Mosek doesn’t support quadratic objectives, JuMP is doing the RotatedSecondOrderCone transformation automatically, but something might be going wrong.)

I get the error
LoadError: Cannot add MathOptInterface.SecondOrderCone constraint on a matrix variable
It seems this error is thrown whenever H is defined either is PSD or Symmetric (by @variable(model,H[1:n,1:n],PSD / Symmetric))

When I created H as a general matrix and manually constrained it to be symmetric and PSD it worked, and so far it seems that on the smaller example that it gets very close to Yalmip’s problem performance (13s in Mosek).

Here is the code, it doesn’t make much sense to me to do the lower.==upper manually, but other more logical solutions (such as@constraint(model, Symmetric(H) in PSDCone())) threw the same error as above

using MosekTools
using JuMP
using DynamicPolynomials
using SumOfSquares
using LinearAlgebra
##

n = 100;
model = SOSModel(Mosek.Optimizer)
@variable(model,H[1:n,1:n]);
@polyvar(x[1:n])
V = x'*H*x;
@constraint(model, V - x'*x >= 0)

@variable(model,t)
# @constraint(model, [t;1/2;vec(H)] in RotatedSecondOrderCone())
@constraint(model, [t; vec(H)] in SecondOrderCone())
@objective(model, Min, t )
@constraint(model, H in PSDCone())
lower = H[LinearAlgebra.tril(trues(size(H)),-1)]
upper = H[LinearAlgebra.triu(trues(size(H)),+1)]
@constraint(model,lower .== upper)

JuMP.optimize!(model)
@show objective_value(model)

This is because Mosek handles PSD variables differently. The default reformulation adds an extra n^2 scalar variables, adds linear constraints to make them equal to H, and then adds the norm constraint on the scalar variables. That explains the different in the variable and constraint count.

Yalmip might be doing a better reformulation. But @blegat is the one to ask on that front.

1 Like

Ahh, this line is wrong, it should have been
upper = H'[LinearAlgebra.tril(trues(size(H)),-1)].
The wrong one vectorized the variables in a different order than lower so the matrix wasn’t actually constrained to be symmetric.

With the fix, it takes around 24s which is still an improvement though…

As discussed in YALMIP vs JuMP - #10 by blegat, YALMIP automatically dualizes the problem before giving it to Mosek. To have the same behavior with JuMP or SumOfSquares, use https://github.com/jump-dev/Dualization.jl/ and model = Model(dual_optimizer(Mosek.Optimizer)).

2 Likes

That actually made it slightly slower.
The Matlab script is the same

Julia script:

using MosekTools
using JuMP
using DynamicPolynomials
using SumOfSquares
using Dualization
##
n = 100;
# model = SOSModel(Mosek.Optimizer) # Time: 38.84  
model = SOSModel(dual_optimizer(Mosek.Optimizer)) # Time: 42.02   
@variable(model,H[1:n,1:n],PSD); # Matrix parametrizing V
@polyvar(x[1:n]) 
V = x'*H*x;
@constraint(model, V - x'*x >= 0)

@objective(model, Min, sum(H[1:end].^2) ); # slower solve than Yalmip
JuMP.optimize!(model)
@show objective_value(model)
@show solve_time(model)

Mosek log with SOSModel(dual_optimizer(Mosek.Optimizer)):

Problem
  Name                   :
  Objective sense        : max
  Type                   : CONIC (conic optimization problem)
  Constraints            : 15152
  Cones                  : 1
  Scalar variables       : 15153
  Matrix variables       : 2
  Integer variables      : 0

Optimizer  - threads                : 4
Optimizer  - solved problem         : the primal
Optimizer  - Constraints            : 10100
Optimizer  - Cones                  : 2
Optimizer  - Scalar variables       : 10103             conic                  : 1010
Optimizer  - Semi-definite variables: 2                 scalarized             : 1010
Factor     - setup time             : 26.59             dense det. time        : 0.00
Factor     - ML order time          : 24.20             GP order time          : 0.00
Factor     - nonzeros before factor : 2.55e+07          after factor           : 4.15
Factor     - dense dim.             : 2                 flops                  : 2.07

Mosek log with Yalmip:

Problem
  Name                   :                 
  Objective sense        : min             
  Type                   : CONIC (conic optimization problem)
  Constraints            : 156             
  Cones                  : 1               
  Scalar variables       : 57              
  Matrix variables       : 2               
  Integer variables      : 0               

Optimizer  - threads                : 4               
Optimizer  - solved problem         : the primal      
Optimizer  - Constraints            : 156
Optimizer  - Cones                  : 1
Optimizer  - Scalar variables       : 57                conic                  : 57              
Optimizer  - Semi-definite variables: 2                 scalarized             : 6160            
Factor     - setup time             : 0.03              dense det. time        : 0.00            
Factor     - ML order time          : 0.00              GP order time          : 0.00            
Factor     - nonzeros before factor : 1.21e+04          after factor           : 1.21e+04        
Factor     - dense dim.             : 0                 flops                  : 2.41e+08  

I’m not sure what Yalmip does but it seems that it reduces the problem in some preprocessing step.
Here H is the matrix with minimal Frobenius norm such that all its eigenvalues are smaller than 1. The optimal solution is the identity.
Therefore it seems that it’s using something like a identity matrix H:

using MosekTools
using JuMP
using DynamicPolynomials
using SumOfSquares
using Dualization
n = 100;
model = SOSModel(dual_optimizer(Mosek.Optimizer))
@variable(model,h[1:n]); # Matrix parametrizing V
using LinearAlgebra
H = zeros(AffExpr, n, n)
for i in 1:n
    H[i, i] = h[i]
end
@constraint(model, Symmetric(H - I) in PSDCone())
@variable(model, t)
@constraint(model, [t; h] in SecondOrderCone())

@objective(model, Min, t)
JuMP.optimize!(model)
@show objective_value(model)
@show solve_time(model)

I get

Problem
  Name                   :                 
  Objective sense        : max             
  Type                   : CONIC (conic optimization problem)
  Constraints            : 101             
  Cones                  : 1               
  Scalar variables       : 101             
  Matrix variables       : 1               
  Integer variables      : 0               

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00            
Lin. dep.  - tries                  : 1                 time                   : 0.00            
Lin. dep.  - number                 : 0               
Presolve terminated. Time: 0.00    
Problem
  Name                   :                 
  Objective sense        : max             
  Type                   : CONIC (conic optimization problem)
  Constraints            : 101             
  Cones                  : 1               
  Scalar variables       : 101             
  Matrix variables       : 1               
  Integer variables      : 0               

Optimizer  - threads                : 4               
Optimizer  - solved problem         : the primal      
Optimizer  - Constraints            : 100
Optimizer  - Cones                  : 1
Optimizer  - Scalar variables       : 101               conic                  : 101             
Optimizer  - Semi-definite variables: 1                 scalarized             : 5050            
Factor     - setup time             : 0.00              dense det. time        : 0.00            
Factor     - ML order time          : 0.00              GP order time          : 0.00            
Factor     - nonzeros before factor : 5050              after factor           : 5050            
Factor     - dense dim.             : 0                 flops                  : 4.34e+05        
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   1.0e+00  2.0e+00  9.8e+01  0.00e+00   1.000000000e+02   1.000000000e+00   1.0e+00  0.00  
1   6.1e-01  1.2e+00  1.5e+02  -2.85e+00  1.841428571e+02   6.133724425e+00   6.1e-01  0.00  
2   5.7e-01  1.1e+00  2.1e+01  2.70e+00   8.732344947e+01   7.858257767e+00   5.7e-01  0.01  
3   4.4e-01  8.8e-01  2.5e+00  2.02e+00   3.641534896e+01   8.999653495e+00   4.4e-01  0.01  
4   7.6e-02  1.5e-01  6.4e-01  2.68e+00   1.129674777e+01   9.711983042e+00   7.6e-02  0.01  
5   2.5e-04  5.0e-04  9.1e-05  9.65e-01   1.000477716e+01   9.999478620e+00   2.5e-04  0.02  
6   2.0e-08  2.1e-08  3.9e-15  1.00e+00   1.000000000e+01   1.000000000e+01   3.1e-11  0.02  
Optimizer terminated. Time: 0.02    

objective_value(model) = 9.999999995326595
solve_time(model) = 0.020170927047729492
0.020170927047729492

Maybe this problem is too reduced and we need to focus on the one of your earlier post.
You need to try to stick to either standard conic form (equality constraints and conic variables) or geometric conic form (conic constraints and free variables).
Then, if you use the standard form and the solver uses geometric form (or vice versa), you need to use Dualization.
In the example above, I used 101 free variables (1 for t and 100 for h) so after dualization, it gives 101 equality constraints.
Then I have 1 LMI constraint which give 1 SDP variable after dualization (here the matrices are diagonal so it could be much simpler of course).
Then you have one constraint [t; h] in SOC. This could be interpreted as the variables [t; h] belong to the SOC cone in the standard form but as Mosek expect a standard form, Dualization should understand it in the geometric form so this is understood as t and h being free variables so in the dual we have a conic SOC variable of length 101 (which gives 1 cone and 101 variables).

3 Likes

I checked the code in your top post. I detailed the issue in https://github.com/jump-dev/SumOfSquares.jl/issues/205.
In short, SumOfSquares uses the standard conic form for SumOfSquares constraints.
In your case, you should use the geometric conic form as discussed in https://github.com/jump-dev/SumOfSquares.jl/issues/205.
As your polynomials are quadratic, you can in fact use JuMP directly, e.g.,

model = Model(dual_optimizer(Mosek.Optimizer))

# Variables
@variable(model,H[1:n,1:n],PSD); # Matrix parametrizing V
@polyvar(x[1:n]) # State
@polyvar(u[1:m]) # Control

# Lyapunov candidate V
V = x'*H*x;

# constraints u.*(u-W1*x) == 0
h = u .* (u - W1 * x);

Vp = subs(V, x => A*x + B*u)

# Constraints

using LinearAlgebra
function matrix(poly)
    M = Matrix{AffExpr}(undef, n + m, n + m)
    v = [x; u]
    for i in 1:(n + m)
        for j in 1:(n + m)
            coef = DynamicPolynomials.coefficient(poly, v[i] * v[j])
            if i == j
                M[i, j] = coef
            else
                M[j, i] = M[i, j] = coef / 2
            end
        end
    end
    return Symmetric(M)
end

# (V - Vp - x'*x) on set (h == 0)
@variable(model, R[1:m])
@constraint(model, matrix(V - Vp - x'*x - sum(R[i] * h[i] for i in 1:m)) in PSDCone());

@variable(model, t)
@constraint(model, [t; vec(H)] in SecondOrderCone())
@objective(model, Min, t)

JuMP.optimize!(model)
@show termination_status(model)
@show objective_value(model)
5 Likes

Yes, this solves it!
Thanks!

1 Like