Encode matrix function which depends on optimization variables at each step

I’m considering a problem which involves constraints on a solution to a linear system whose coefficients depend on the optimization variables. So essentially, the constraints are functions of a matrix inverse involving the optimization variables. I’m trying to figure out a good way of modeling this problem. As an example, consider the following model:

\begin{align} \underset{x \in \mathbb{R}^{n}}{\text{minimize}} &&& 0 \\ \text{s.t.} && \|C(x)_i \cdot x\|_2^2 &\leq b_i , \quad i \in \{1, \ldots, n\}\\ \end{align}

where C(x) = A(x)^{-1} \cdot B with constant data matrix B \in \mathbb{R}^{n \times n}, C_i is the i-th row of C, and A(x)_{ij} = (x_i + x_j)^2 with i,j \in \{1, \ldots, n\}.

I’m considering two approaches:

  1. Enforce C(x)_{ij} = [A(x)^{-1} \cdot B]_{ij} entrywise via n^2 @NLconstraints and let JuMPs AD handle the derivatives
  2. Register a user-defined multivariate function \mathcal{C}: \mathbb{R}^{n} \to \mathbb{R}^{n \times n} and provide only the first derivative \frac{{\rm d}}{{\rm d}x} \mathcal{C}.

The issue with (1) is that for larger systems, the approach introduces n^2 nonconvex constraints. The issue with (2) is that each constraint i only relies on the i-th row of C, and I think I’d have to compute \frac{{\rm d}}{{\rm d}x} \mathcal{C} redundantly in each constraint. Currently working on (2) and would appreciate any thoughts / guidance!

approach 1


using JuMP, Ipopt, LinearAlgebra

## data
n = 3
B = Diagonal(ones(n))
bounds = 1500ones(n)

## model
## min_x  0
## s.t.   C(x) = -A(x)^{-1} * B
##        ||C(x)_i * x||_2^2 <= rhs_i for i ∈ {1, ... , n}
m = Model(solver=IpoptSolver())
@variable(m, x[1:n])
@variable(m, C[1:n, 1:n])
setvalue(getindex(m, :x), collect(1:n))
# @NLobjective(m, Min, sum(x[i]^2 for i in 1:n))

## A(x)
A = Array{JuMP.NonlinearExpression,2}(undef, n, n)
for i in 1:n
    for j in 1:n
        A[i, j] = @NLexpression(m, (x[i] + x[j])^2)
    end
end
JuMP.registerobject(m, :A, A, "A")
setvalue(getindex(m, :C), getvalue(m[:A]) \ -B)

## C(x) = A^{-1}(x) * -B
@constraintref C_constraint[1:n, 1:n]
for i in 1:n
    for j in 1:n
        C_constraint[i, j] = @NLconstraint(m, sum(A[i,k] * C[k,j] for k in 1:n) + B[i,j] == 0)
        println(sum(sum(getvalue(A[i,k]) * getvalue(C[k,j]) for k in 1:n) + B[i,j] for k in 1:n))
    end
end
JuMP.registercon(m, :C_constraint, C_constraint)

## primary constraint
@constraintref primary_constraint[1:n]
for i in 1:n
    primary_constraint[i] = @NLconstraint(m, sum((C[i,k] * x[k])^2 for k in 1:n) <= bounds[i])
    println(sum((getvalue(C[i,k]) * getvalue(x[k]))^2 for k in 1:n))
end

solve(m)
xstar = getvalue(m[:x])
Cstar = getvalue(m[:C])
Astar = getvalue(m[:A])
norm(Astar * Cstar + B)

For the problem you posed, it looks like you can define the n constraints y = A(x)^{-1} B x and then n more constraints y_i^2 \le b_i, so you only have 2n constraints, and let JuMP’s AD handle the derivatives.

2 Likes

@stevengj: Thanks, this is great! My particular problem is a bit more involved than the example, but maybe I can apply this style of approach! I’ll work from your suggestion for now and perhaps post a follow-up (if I get stuck) or a more detailed study (if I get it to work) :upside_down_face:

Just took a look and my “MWE” was not a faithful representation of the actual problem… The constraint actually looks like \|D^{1/2} \cdot C(x)_i \|_2 \leq b_i for D \in \mathbb{R}^{n \times n}, D \succ 0 constant. In this case, I believe that there is not a nice way of introducing auxiliary variables to reduce the number of constraints, right?

Updated code (w/ an objective function) below:

using JuMP, Ipopt
using SparseArrays, LinearAlgebra, Random
using ForwardDiff

## data
Random.seed!(1234)
n = 3
B = randn(n, n)
D = randn(n, n)
D = D'*D
D_12 = cholesky(D)
bounds = 50ones(n)

function toymodel(n, B, D_12, bounds)
    ## model
    ## min_x  ||x||_2^2
    ## s.t.   C(x) = -A(x)^{-1} * B
    ##        ||D^{1/2} * C(x)_i||_2 <= rhs_i for i ∈ {1, ... , n}
    m = Model(solver=IpoptSolver())
    @variable(m, x[1:n])
    @variable(m, C[1:n, 1:n])
    setvalue(getindex(m, :x), collect(1:n))
    @NLobjective(m, Min, sum(x[i]^2 for i in 1:n))

    ## A(x)
    A = Array{JuMP.NonlinearExpression,2}(undef, n, n)
    for i in 1:n
        for j in 1:n
            A[i, j] = @NLexpression(m, (x[i] + x[j])^2)
        end
    end
    JuMP.registerobject(m, :A, A, "A")
    setvalue(getindex(m, :C), getvalue(m[:A]) \ -B)

    ## C(x) = A^{-1}(x) * -B
    @constraintref C_constraint[1:n, 1:n]
    for i in 1:n
        for j in 1:n
            C_constraint[i, j] = @NLconstraint(m, sum(A[i,k] * C[k,j] for k in 1:n) + B[i,j] == 0)
            println(sum(sum(getvalue(A[i,k]) * getvalue(C[k,j]) for k in 1:n) + B[i,j] for k in 1:n))
        end
    end
    JuMP.registercon(m, :C_constraint, C_constraint)

    ## primary constraint
    @constraintref primary_constraint[1:n]
    for i in 1:n
        primary_constraint[i] = @NLconstraint(m, sum(sum(D_12.U[j,k] * C[i,k] for k in 1:n)^2 for j in 1:n) <= bounds[i]^2)
        println(sum(sum(D_12.U[j,k] * getvalue(C[i,k]) for k in 1:n)^2 for j in 1:n))
    end
    return m
end

m = toymodel(n, B, D_12, bounds)
solve(m)
for i in 1:n
    for j in 1:n
        println(sum(sum(getvalue(m[:A][i,k]) * getvalue(m[:C][k,j]) for k in 1:n) + B[i,j] for k in 1:n))
    end
end
for i in 1:n
    println(sum(sum(D_12.U[j,k] * getvalue(m[:C][i,k]) for k in 1:n)^2 for j in 1:n))
end
xstar = getvalue(m[:x])
Cstar = getvalue(m[:C])
Astar = getvalue(m[:A])
norm(Astar*Cstar + B)  # = 9.308350126956984e-10

What your code computes is \Vert D^{1/2} C(x)_i^T\Vert. In this case you could do n constraints y = diag(C(x) * D * C(x)') and then have n constraints y_i^2 \le b_i.

1 Like

So this means opting for approach (2) and registering a function to compute C(x) := -A^{-1}(x) B and its derivative? Otherwise I only see how to encode a full n^2 constraint Y as

\begin{align} Y = C D C^{\top} \iff A Y A^{\top} = B D B^{\top} \end{align}

Or is there a way to rearrange the computation to enforce only y = {\rm diag}(Y) that I’m missing?

I don’t know if JuMP has automated code for diag(M(x)) constraints or whether you have to implement it manually, but it’s definitely mathematically possible to have only n constraints (and compute n gradients) in a problem like yours.

1 Like

Ah yeah, so I think I can get the n diag constraints to work (albeit somewhat inefficiently) in JuMP once a matrix M is computed. At this point, I guess the proper question is: how to best encode {C}(x) := -A^{-1}(x)B in JuMP. I’ll try registering a multivariate function with derivative

{\rm D}_xC(x) = \left( -(A(x)^{-1}B)^{\top} \otimes A(x)^{-1} \right) \cdot \frac{{\rm d}A(x)}{{\rm d}x}

but the current way I’m encoding y = diag(CDC') entry-by-entry requires a new call to C (and I’m guessing its derivative, too?) for each of the n constraints since JuMP doesn’t handle vectorized matrix multiplication / assignment in @NLconstraints (at least I think not).