Solve for Hermitian PSD with non-linear objective in JuMP

I am new to using JuMP, and was hoping for some help. I am trying to estimate the elements of a matrix that is:

  • Hermitian (in general complex numbers)
  • Semi-positive definite
  • diagonal elements are >= 0
  • trace == 1

After reading a good portion of the docs and googling, I think I understands how to define the variables and constraints (given that JuMP only accepts real variables), so that for A = AR + i AI,

@variable(model, AR[1:4,1:4], Symmetric)
@variable(model, AI[1:4,1:4] in SkewSymmetricMatrixSpace())
@constraint(model, [AR AI; -AI AR] in JuMP.PSDCone())
@constraint(model, conAR[i = 1:4], AR[i,i] >= 0)
@constraint(model, conAI[i = 1:4], AI[i,i] == 0)
@constraint(model, sum(AR[i,i] for i in 1:4) == 1)

However, the objective function I am maximizing is non-linear (but well-behaved, differentiable). I am not an expert in complicated optimization problems, but from the documentation, there do not seem to be any solvers that advertise working for both non-linear and semi-definite programing. Does anyone have a suggestion for a solver for this problem? Iā€™ve tried a few, but everything has protestedā€¦ can I simply not include the PSD constraints in a non-linear problem?

Thanks in advance!

I donā€™t think any solver supports PSD+nonlinear directly.

If the objective function is convex, reformulate it as a conic optimization problem.

If the objective function is non-convex, youā€™re out-of-luck. You could try forming a convex approximation of the objective function to generate local solutions. Or you could try and rewrite the PSD constraint as a nonlinear user-defined function Nonlinear Modeling Ā· JuMP (e.g., eigen(x...) >= 0). I donā€™t have any experience doing so, but others may have.

1 Like

Thanks for the reply - I was afraid that was the answer.
I havenā€™t proved it but Iā€™m fairly certain my objective function is convex - unfortunately I have no experience with conic optimization formulations. I guess learning that is the next step.

If you can provide a minimal working example, people may have further advice.

1 Like

@odow, I would provide a MWE, except my issue here more conceptual than it is code not working. But I can describe the objective function to see if anyone has thoughts on how to fit it into the JuMP framework as anything other than a non-linear objective.

As I said, i am trying to estimate a matrix (a quantum mechanical density operator \rho_0) with the Hermitian/PSD constraints in my first post. This optimization is for a tomographic process; first, a linear transformation U is applied so that

U \rho_0 U^\dagger \to \rho.

The diagonal elements of the new matrix, the vector \mathbf{p} are then relative probabilities that each mode will be measured - for an N x N density operator, there are N modes. Measurements are collected and accumulated to form the vector of counts \mathbf{k}, so that the measurements are a multinomial distribution with a likelihood vector \mathbf{p}. We want to maximize the likelihood, which (after discarding factors that donā€™t depend on \rho) yields

\mathcal{L}(\rho) \propto \prod_{i=1}^N p_i^{k_i}

For the estimation to work, this is done for several different transformation matrices \{U^{(j)}\}, and the likelihood of all of the measurements is just the product of the marginal likelihoods. For the objective function I then choose the log of the likelihood, which (after again, discarding parts that donā€™t depend on the measurements or \rho) is,

C(\rho) = \sum_{j=1}^M \sum_{i=1}^N k_{i,j} \log p_{i,j}.

I am fairly certain this is convex in the elements of \rho - each p_{i,j} is the result of matrix multiplications, log is convex, and the cost is a convex combination of those terms. But, this is obviously not a linear objective function. If you know how I might fit this into something other than a non-linear objective in JuMP, I would be very grateful.

As a side note I have already implemented generic global non-linear estimators based on MCMC and simulated annealing, so i am less interested in using JuMP as a general non-linear optimizer. I was hoping to exploit the structure to get a faster, more robust (and less noisy) estimate. It is easy enough to take derivatives of the objective and do gradient descent, but I didnā€™t know how to incorporate the constraints until I came across semi-definite programing. But if I canā€™t get it to work, at least Iā€™ve learned something useful. Thanks again.

log is representable using the exponential cone. Not sure I fully understand your problem statement (I assume U and k are data?), but this should point you in the right direction:

model = Model(SCS.Optimizer)
@variables(model, begin
    AR[1:4,1:4], Symmetric
    AI[1:4,1:4] in SkewSymmetricMatrixSpace()
end)
P0 = [AR AI; -AI AR]
N = size(P0, 1)
@constraints(model, begin
    [i = 1:4], AR[i,i] >= 0
    [i = 1:4], AI[i,i] == 0
    sum(AR[i,i] for i in 1:4) == 1
    P0 in PSDCone()
end)
# For one U, but you can generalize
k = rand(1:10, N)  # fake data
p = [P0[i, i] for i in 1:N]
@variable(model, t[1:N])
@constraint(model, [i = 1:N], [p[i], 1, t[i]] in MOI.ExponentialCone())
@objective(model, Max, sum(k[i] * t[i] for i in 1:N))
optimize!(model)

This is a good reference: 5 Exponential cone optimization ā€” MOSEK Modeling Cookbook 3.3.0

1 Like

@ericphanson may be able to point to some similar examples using https://github.com/jump-dev/Convex.jl. He worked in this space.

For example: Fidelity in quantum information theory Ā· Convex.jl

This is extremely helpful - somehow I havenā€™t come across conic optimization before. I have a lot of reading to do.

The code you posted is close to what I need (still needs to incorporate the unitary transformation U), but as you said, I think this will point me in the right direction. Convex.jl also looks intriguing - it might be simpler to pose the problem in that formulation. Iā€™ll try out both. Also very interesting to see an optimization on state fidelity - while thats not what I am doing here, I may need to in the future.

1 Like

The MOSEK cookbook is an excellent reference for conic formulations.

See also this example in Convex.jl: Complex-domain Optimization Ā· Convex.jl

1 Like

I had come across that example as well - it certainly seems like Convex.jl is a good framework for what I am trying to do. I think I have it written correctly, but it is returning that the solution is infeasible. This is a little disappointing as simulated annealing returns essentially the correct answer, but I may still be doing something wrong. What I have is this

# k is a length M vector of integers
# V is a length M vector of complex vectors of length 4
Ļ = HermitianSemidefinite(4)
p0 = [quadform(v, Ļ) for v in V]
obj = sum(k[i] * log(real(p0[i])) for i in 1:M)
problem = maximize(obj)
problem.constraints += [real(Ļ[i,i]) >= 0 for i in 1:4]
problem.constraints += tr(Ļ) == 1.0
solve!(problem, SCS.Optimizer)

If I restrict to a subset of the measurements it will occasionally return with an inaccurate answer (and says so). If the model is correct, I am not sure what else I can do. It may be a different set of measurements would better condition the solution, but I am not taking the measurements.

Thanks again for the help!

Penopt should but the wrapper doesnā€™t seem complete.

1 Like

I might have missed it, but can you give what the solution simulated annealing finds? If it fits those constraints then there could be a bug or numerical issue. If it doesnā€™t, then maybe the constraints are too restrictive (if you are happy with the SA solution that does not meet them, that is).

edit: I guess one would also need k and V to check this.

1 Like

The k and V vectors are reasonably long, and as they are collected data by others, I donā€™t know that I can share them. Short version is the simulated annealing reconstructs the correct density operator. I only have two thoughts on what might be going on. First, the more I look at it, I donā€™t know if the measurements I have are complete enough to truly solve the inverse problem - its great that a random walk process like simulated annealing worked, but this needs more investigation. Second, despite specifying the density operator as complex, in the cases where I did get an (incorrect) answer by restricting the data, the complex part was always exactly 0. Not sure why.

I think my next step is to write my own simulator to try generating data that I can trust, and see how that goes. If that still has issues, at least I can then provide a MWE. Thanks for the input.

3 Likes

Ok, will be interesting to see with a MWE! If there are numerical issues with the solvers then there are various scaling tricks one can try (generally itā€™s best if all values are of the same order of magnitude). Iā€™m not very familiar with these things though.

Convex reformulates all problems as a linear objective function subject to affine-function-in-a-cone constraints (for a few well understood cones), which is a pretty constrained class and things should generally be fairly well behaved (when using continuous variables as you are; with integer or binary constraints things are different). In particular they are all convex (as the name implies), so have no local minima, only global ones, and under a few conditions you have poly-time convergence to the optimal value etc etc. So generally if you can formulate the problem you want (without getting DCP warnings!), it ā€œshouldā€ work (but could be too big to fit into memory, or have numerical issues, or be slow, or there could be a bug).

Edit: one thought, kind of discussed before, is if the constraints are determined by real-world noisy data, it seems easy to believe there is no solution which fits them all (the problem is infeasible). Depending how you did the constraints with SA perhaps you are finding a solution close to fitting the constraints with a good objective value but which is not technically a solution. In which case perhaps loosening the constraints in Convex would give you a similar solution.

1 Like

If the matrix is only four dimensional, then if it were me Iā€™d probably just use the optimization algorithm to find the Cholesky factor (namely, lower triangular with the constraint of non-negative diagonal values). I bet you could use Symbolics.jl or something to find the exact expression for the trace of a 4x4 matrix from its Cholesky factor, and you could presumably then plug that right into JuMP to deal with that constraint. Otherwise finding the Cholesky should get rid of the rest of your constraints. So that might work. I donā€™t work in QM, but I often try to estimate/recover/whatever low-dimensional SPSD matrices from Cholesky factors and itā€™s been reasonably smooth sailing.

Iā€™ll post once i have a MWE with simulated data, but it might be a few days (other work needs to get done). I had the thought that the data might be the issue, but it isnā€™t a matter of satisfying the constraints. There are plenty of matrices that fit the constraints, and I just want the one that maximizes the likelihood of the measurements. In SA it is fairly straightforward to satisfy the constraints (just force every new candidate to be is a valid density matrix - I would randomly perturb the matrix (maintaining Hermitian-ness, take an eigen decomposition and force the eigen values to be non negative).

This is where my lack of familiarity with conic constraints might be the problem. I can believe that the way I am writing the objective function and constraints - while technically correct - might start the solver in a bad spot from which it can never get to a valid matrix.

I had wondered if solving for fewer variables in one of the decompositions (Cholesky, LDL, Eigen, etc.) would be useful - thanks for the suggestion. If I am still having issues after tying it on clean, simulated data I will look into that. Iā€™ve been wanting to find an excuse to try out Symbolics.jl for a while now.

Makes sense. I definitely think of any matrix factorization Cholesky is the one you want. For an SPSD matrix I canā€™t think of an advantage of LDLt, and an eigendecomposition/QR/whatever now again means you need to constrain the eigenvector/Q/whatever matrix to be orthogonal, which again is probably annoying and leads to a much higher-dimensional optimization problem. Whereas with the Cholesky, you have no constraints on the elements of L besides that the diagonal entries be non-negative.

Thinking about this for an extra second, I suppose if some of the diagonals of L are zero then the representation stops being unique, but presumably thereā€™s some other kind of structure you could choose to get uniqueness back.

So I found some time to work on this today, and I think I figured it out. Below is a MWE with a fake data generator. And when I did the optimization, it worked! However, the data was qualitatively the same as the measurements I was working withā€¦ The only difference I noticed was the scaling - the measurements had counts above 100000, my simulation was around 1000. If you increase N to 100000 in the MWE below, the solution goes from ALMOST_OPTIMAL (and is essentially correct) to INFEASIBLE. For large N, this is essentially just scaling the count vector K, and carrying that through the calculation, is essentially just a scaling of the objective function.

I therefore went back to my measurements, divided the counts by 1000 (and rounded to and integer), and it worked! A fast, accurate maximum likelihood estimate.

So at this point I have two remaining questions. Any idea why scaling the objective function would cause a problem for Convex? @ericphanson, is this what you meant by

Is there a best practice to avoid this problem? Iā€™m not clear on what it means to have the objective function be of the same order of magnitude as a constraintā€¦ Second, any idea what is causing the ALMOST_OPTIMAL instead of OPTIMAL in the solution?

If I get time I may try @cgeoga 's suggestion and solve for the Cholesky form, and see if that has these same issues (but since this is now working, that is unlikely to happen soon).

Thanks for all of the help / suggestions! Here is the MWE if you want to reproduce the INFEASIBLE issue when scaling up the counts.

using Convex
using SCS
using Distributions
""" 
N is number of samples in each measurement.
Below ~100, estimate degrades from noise, but at 100000
the solver breaks, says solution is INFEASIBLE... 
Between those points it finds a solution that is ALMOST_OPTIMAL
"""
N = 1000 

mh = [-1 0;0 1];
mq = [1im 0; 0 1];
R(Īø)=[cos(Īø)  -sin(Īø); sin(Īø)  cos(Īø)];
MH(Īø)= R(Īø) * mh * R(-Īø)
MQ(Īø)= R(Īø) * mq * R(-Īø)
UL(Īø) = MH(Īø/2) * MQ(Īø)
UC = MH(0) * MQ(-Ļ€/4)

Uv = [UL(0), UL(Ļ€/4), UC]

Ļ0 = [0.5 0 0 0.5;
      0 0 0 0;
      0 0 0 0;
      0.5 0 0 0.5]

function combinematrices(U1, U2)
    M = zeros(Complex, 4, 4);
    for i in 1:2, j in 1:2
        ij = 2 * (i-1) + j  
        for k in 1:2, l in 1:2
            kl = 2 * (k-1) + l
            M[kl, ij] = U1[k, i] * U2[l,j]  
        end
    end  
    return M
end

""" Generate fake data """
K = Int[]
V = Vector{ComplexF64}[]
# generate samples
for U1 in Uv
    for U2 in Uv
        U12 = combinematrices(U1, U2)
        Ļ1 = U12 * Ļ0 * (adjoint(U12))
        pr = abs.(diag(Ļ1))
        d = Multinomial(N, pr)
        
        k = rand(d) 
        append!(K, k)
        for i in 1:4
            v = conj.(U12[i,:]) # this is already a column vector, so don't take adjoint, just conjugate
            push!(V, v)
        end
    end
end

""" Define and solve optimization problem """
Ļ = HermitianSemidefinite(4)
p0 = [real(quadform(v, Ļ)) for v in V]
obj = sum(K[i] * log(p0[i]) for i in 1:length(V))
problem = maximize(obj)
problem.constraints += [real(Ļ[i,i]) >= 0 for i in 1:4]
problem.constraints += tr(Ļ) == 1.0;

solve!(problem, SCS.Optimizer)
Ļmle = evaluate(Ļ)

Edit: If you normalize the K vector, the ALMOST_OPTIMAL goes away in this example. So it appears all of my issues have to do with scaling.

1 Like