How to optimize trace of matrix inverse with JuMP or Convex?

Hi community,

I want to minimize tr(S^{-1}) + tr((A - S)^{-1}) such that S \succeq 0 and A - S \succeq 0 (S is the optimization matrix, A is some fixed positive definite matrix)
I tried using JuMP + Hypatia:

using Hypatia, JuMP
p = 5
x = randn(p, p)
A = Symmetric(x' * x)

# model and constraints constraints
model = Model(() -> Hypatia.Optimizer())
@variable(model, S[1:p, 1:p], Symmetric)
@constraint(model, S in PSDCone())
@constraint(model, A - S in PSDCone())

# objective
@NLobjective(model, Min, tr(inv(S)) + tr(inv(A - S)))

# solve and return
JuMP.optimize!(model)

which gives me the following error:

Unexpected array VariableRef[S[1,1] S[1,2] S[1,3] S[1,4] S[1,5]; S[1,2] S[2,2] S[2,3] S[2,4] S[2,5]; S[1,3] S[2,3] S[3,3] S[3,4] S[3,5]; S[1,4] S[2,4] S[3,4] S[4,4] S[4,5]; S[1,5] S[2,5] S[3,5] S[4,5] S[5,5]] in nonlinear expression. Nonlinear expressions may contain only scalar expressions.

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] parse_expression(#unused#::MathOptInterface.Nonlinear.Model, expr::MathOptInterface.Nonlinear.Expression, x::Symmetric{VariableRef, Matrix{VariableRef}}, parent_index::Int64)
    @ MathOptInterface.Nonlinear ~/.julia/packages/MathOptInterface/cl3eR/src/Nonlinear/parse.jl:235
  [3] parse_expression(data::MathOptInterface.Nonlinear.Model, expr::MathOptInterface.Nonlinear.Expression, x::Expr, parent_index::Int64)
    @ MathOptInterface.Nonlinear ~/.julia/packages/MathOptInterface/cl3eR/src/Nonlinear/parse.jl:52
  [4] parse_expression
    @ ~/.julia/packages/MathOptInterface/cl3eR/src/Nonlinear/parse.jl:14 [inlined]
  [5] set_objective
    @ ~/.julia/packages/MathOptInterface/cl3eR/src/Nonlinear/model.jl:43 [inlined]
  [6] set_nonlinear_objective(model::Model, sense::MathOptInterface.OptimizationSense, x::Expr)
    @ JuMP ~/.julia/packages/JuMP/puvTM/src/nlp.jl:163
  [7] macro expansion
    @ ~/.julia/packages/JuMP/puvTM/src/macros.jl:2332 [inlined]
  [8] top-level scope
    @ In[4]:12
  [9] eval
    @ ./boot.jl:373 [inlined]
 [10] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1196

I also tried to define some variable t such that t \ge tr(S^{-1}), but there I get a no method matching VariableRef(::AffExpr) error.

Seems like CVX supports trace_inv, is there a similar alternative in JuMP or Convex.jl? I couldn’t find one.

Any help is highly appreciated.

I don’t know if there’s an easy way to do this in JuMP. You can’t call arbitrary functions in the @NL like inv: Nonlinear Modeling · JuMP, and even if you could, Hypatia doesn’t support the nonlinear interface.

There might be some relationship you can use to reformulate tr(inv(X)) into something else, but I’m not sure. Someone else might have a good idea.

Also, as a tip, you can change your model to the slightly more compact form:

model = Model(Hypatia.Optimizer)
@variable(model, S[1:p, 1:p], PSD)
@constraint(model, A >= S, PSDCone())

Thank you for your quick response. I found this post which recommends I do the following

using Hypatia, JuMP, LinearAlgebra
p = 5
x = randn(p, p)
A = Symmetric(x' * x)

# model and constraints constraints
model = Model(() -> Hypatia.Optimizer())
@variable(model, S[1:p, 1:p], PSD)
@constraint(model, A >= S, PSDCone())

# introduce X and Y
@variable(model, X[1:p, 1:p])
@variable(model, Y[1:p, 1:p])
@constraint(model, [X I; I S] in PSDCone())
@constraint(model, [Y I; I A-S] in PSDCone())

# objective
@objective(model, Min, tr(X) + tr(Y))

# solve and return
JuMP.optimize!(model)

which seems to work. The idea is that instead of working with tr(S^{-1}) in the objective, we work with tr(X) where \begin{bmatrix}X & I\\ I & S\end{bmatrix} \succeq 0. Then by Schur complement argument, \begin{bmatrix}X & I\\ I & S\end{bmatrix} \succeq 0 \iff X - S^{-1} \succeq 0 \iff X \succeq S^{-1} \iff tr(X) \ge tr(S^{-1}).

8 Likes

Nice! That’s a useful trick to know.

You could try to optimize over trace inverse directly. One of the more complicated cones Hypatia has is EpiPerSepSpectralCone, which is parameterized by a domain and a “separable spectral function”. For this case, the separable spectral function of interest is the trace inverse. The domain of interest is the PSD cone.

There is no exported function type for “trace inverse” but that’s because it is the convex conjugate of negative-trace-squareroot, NegSqrtSSF (scaled by a factor of 4). You can make the following dual cone:

cone = EpiPerSepSpectralCone{Float64}(Hypatia.Cones.NegSqrtSSF(), Hypatia.Cones.MatrixCSqr{Float64, Float64}, p, true)

For reference, see table 1: https://arxiv.org/pdf/2103.04104.pdf.
For your model:

model = Model(Hypatia.Optimizer)
@variable(model, t1)
@variable(model, t2)
@variable(model, S[1:p, 1:p], Symmetric)
@constraint(model, vcat(1.0, t1, [(A[i, j] - S[i, j])  * (i == j ? 1 : sqrt(2)) for i in 1:p for j in 1:i]...) in cone)
@constraint(model, vcat(1.0, t2, [S[i, j] * (i == j ? 1 : sqrt(2)) for i in 1:p for j in 1:i]...) in cone)
@objective(model, Min, 4 * (t1 + t2))

A-S in PSDCone() and S in PSDCone() are implied in the cone constraints.

5 Likes

@lkapelevich Thank you for the suggestion.

I tried your approach on the same problem. Answers agree, and when p=50, your solution took ~120 seconds on my machine, while my original solution took ~960 seconds. So it’s definitely worth looking into, although I’m not really sure what you are doing. I’ll have to spend more time on it. Thank you so much!!

3 Likes

I skipped a lot of steps, happy to clarify anything. For:

cone = EpiPerSepSpectralCone{T}(
    h::Cones.SepSpectralFun,
    Q::Type{<:Cones.ConeOfSquares{T}},
    d::Int,
	use_dual::Bool,
	)

we are making a cone that is the closure of:
(u, v, w) : v > 0, w \in Q, u \geq v \cdot h(w / v)
if use_dual is false, and
(u, v, w) : u > 0, w \in Q, v \geq u \cdot h^\ast(w / u)
if use_dual is true.

I picked h = NegSqrtSSF() and Q = MatrixCSqr{Float64, Float64}- which is like the PSD matrices. Then:

@constraint(model, vcat(1.0, t2, [S[i, j] * (i == j ? 1 : sqrt(2)) for i in 1:p for j in 1:i]...) in cone)

says t2 >= trace inverse of S and S is PSD. The matrix variable S is vectorized and scaled because that’s how matrix variables are accepted natively.

2 Likes