SDP trace minimization with affine constraints: help to formulate the problem

Hi, I need to solve the following convex optimization problem:
min_{X}\quad tr(X) s.t. A(X)=b and X semi-definite positive,
where A computes rank-1 projections: A(X)=[<aᵢaᵢᵀ,X>]_{i=1}^n.

However I never really used optimization packages in julia and I’m a bit lost. I formulated the problem in JuMP as follows:

d = 5
m = Model(with_optimizer(SCS.Optimizer))
@variable(m, X[1:d, 1:d], PSD)
@constraint(m, [Ω[:,i]'*X*Ω[:,i] for i in 1:size(Ω, 2)] .== b)
@objective(m, Min, tr(X))
solution = value.(X)

I quickly tried different solvers supporting SDP optimization:
SeDuMi > requires a license
Mosek > requires a license
CSDP > segfault
ProxSDP > looks like it is working, but quite slow for d>≈10
SCS > faster for intermediate values of d, but does not scale very well either

So my questions would be :

  • Is there a preference, between the constraint written above and ‖A(x)-b‖≤ε, i.e. something like @constraint(m, cat([ε], [Ω[:,i]'*X*Ω[:,i] for i in 1:size(Ω,2)] - s; dims=1) in SecondOrderCone())? Will it be solved differently by the solver? It seems to be faster to define the equality constraint, but I don’t know what’s happening in the background.
  • I could compute the operator A more efficiently, e.g. using the cholesky decomposition of X and using 1 matrix product, but I couldn’t call cholesky on X, which has a specific type. Is there a way to code the constraint by a user-defined function (working on arbitrary matrices) ?
  • If this is possible, is there a difference between doing so and the code above from the optimizer’s perspective? I mean, does JuMP infers some information on the nature of the constraint by analyzing the code (i.e. recognize it’s an affine constraint and leverage it), or is it just seen as a black box function? I understand e.g. that for the objective, using “simple” functions has the advantage that first-order methods can use auto-differentiation (although it might be more efficient to register a user-defined function/gradient), but I don’t understand if there are similar advantages regarding the constraints.
  • If I define an “efficient” function to compute the constraint, is there a way (with JuMP or any other optimization package) to solve the problem more efficiently?

Sorry if the questions are not very clear, I think I don’t know enough about how optimization frameworks work and how they connect to the different solvers to understand what is really going on when running the code.

Note: I guess for large scale instances, it would be relevant to optimize directly using a cholesky factorization of X, but for now I just wanted to try something simple.

It seems to be faster to define the equality constraint

In almost all cases it will be.

Is there a way to code the constraint by a user-defined function (working on arbitrary matrices) ?

Two options.

If building the constraint takes too long, you can always scalarize it which should be fast:

n = size(Ω, 2)
for k in 1:n
    @constraint(m, sum(Ω[i, k] * X[i, j] * Ω[j, k] for i in 1:n, j in 1:n) == b[k])

Otherwise, yes, you can use a user-defined function:

function foo(Ω, X)
    return [Ω[:, i]' * X * Ω[:, i] for i in 1:size(Ω, 2)]
@constraint(m, foo(Ω, X) .== b)

is there a difference between doing so and the code above from the optimizer’s perspective?

No. JuMP will normalize the constraint before passing to the solver.

Basically, JuMP’s goal is to take the problem you write down and convert it into a format that a solver understands. This format is defined by MathOptInterface. You might find it instructive to read the MOI docs: Manual · MathOptInterface

You should get better performance by entering the dual problem in JuMP for SCS, SeDuMi and ProxSDP. There is an automatic dualization GSOC proposal to automate that

1 Like

Thanks, it makes sense. I read a bit MOI docs, what I understand is that constraints are defined as the result of “some function” belonging to “some set”. That’s why I was asking about the difference between the two scenarios: I guess that when the constraints as scalar, it’s relatively easy (or maybe not?) for JuMP to understand that these are linear constraints, and rewrite them using the MOI type for linear functions. If using a more complex (and vector-valued) function, I guess it might be more difficult and JuMP would just convert the constraint to the equivalent MOI constraint (enforcing the result of this complex and user-defined function to belong to some set).

Regarding the user-defined function, is there a way I can do something like that (in place of your function foo):

function A(Ω,X)
    U = cholesky(X).U
    sum((Ω'*U').^2; dims=2)

which is much faster? I get an error because cholesky is not defined for ::Symmetric{JuMP.VariableRef,Array{JuMP.VariableRef,2}}), but I guess there are many other cases where it is restrictive to work on this specific type ?

And regarding the dual problem, sorry but do you have an example somewhere of how to do so? I’ve seen in the doc functions to look e.g. at the dual value, but nothing about entering manually the dual problem.

I also tried in parallel to work on the related non-convex problem: \min_U Tr(UUᵀ) (i.e. \min ‖U‖_F) s.t. A(UUᵀ)=b, i.e. just using a low-rank factorization of X (for a fixed rank). JuMP formulation:

cstr(Ω, U) = sum((Ω'*U).^2; dims=2)
@variable(m, U[1:d, 1:r])
@constraint(m, cstr(Ω, U) .== b)
@objective(m, Min, sum(U.^2))

I tried to refer to Installation Guide · JuMP to choose an adapted solver, but I couldn’t really get it to work. If I understand correctly, with this formulation I have both quadratic function and quadratic constraints.

  • Ipopt: the objective is always 0… and the returned value for U is 0 as well, with the message EXIT: Converged to a point of local infeasibility. Problem may be infeasible.. (Trying with 400 variables, 320 equality constraints, random Ω, so not sure if the problem comes from my formulation of the problem, or just from the solver. Could it be an initialization-related issue?).
  • ECOS, SCS, OSQP: ERROR: LoadError: Constraints of type MathOptInterface.ScalarQuadraticFunction{Float64}-in-MathOptInterface.EqualTo{Float64} are not supported by the solver and there are no bridges that can reformulate it into supported constraints. → is it really not supported, or should I rewrite it differently?

Would you advise to look directly at packages such as NLOpt ?

I would not work with the cholesky decomposition. The semidefinite programming are there to handle matrices that possess a cholesky decomposition in a convex way (by seeing them as semidefinite matrices whose cone have a self-concordant barrier). I would not expect an NLP solver to work better.

The dual of
min tr(X)
A(x) = b
X is PSD
max b^T y
I - sum A_i y_i is PSD
y free
so for modeling the dual, do

model = Model(...)
@variable(model, y[1:size(Ω, 2)])
using LinearAlgebra
@constraint(Symmetric(I - sum(Ω[:,i] * Ω[:,i]' * y[i] for i in 1:size(Ω, 2))) in PSDCone())
@objective(model, Max, b' * y)
1 Like

Oh OK, I thought you meant some package could use both primal/dual formulations at the same time. And if you are interested in the argmin, I guess you have to use complementary slackness rules to find the primal optimal variables?

Regarding the low-rank factorization (not Cholesky, I think I made a mistake in the first post), I think it can be useful when the dimension is large, just to reduce the memory usage.

You don’t need to use complementary slackness, you can get it with dual

If you want a solver that exploits the fact that the optimal solution is low rank you can try GitHub - mariohsouto/ProxSDP.jl: Semidefinite programming optimization solver
To exploit the rank 1 of A_i, IIRC SDPT3, DSDP and SDPLR can exploit that but the packages are still WIP, help is welcome :slight_smile: