So I found some time to work on this today, and I think I figured it out. Below is a MWE with a fake data generator. And when I did the optimization, it worked! However, the data was qualitatively the same as the measurements I was working withā¦ The only difference I noticed was the scaling - the measurements had counts above 100000, my simulation was around 1000. If you increase N
to 100000 in the MWE below, the solution goes from ALMOST_OPTIMAL (and is essentially correct) to INFEASIBLE. For large N
, this is essentially just scaling the count vector K
, and carrying that through the calculation, is essentially just a scaling of the objective function.
I therefore went back to my measurements, divided the counts by 1000 (and rounded to and integer), and it worked! A fast, accurate maximum likelihood estimate.
So at this point I have two remaining questions. Any idea why scaling the objective function would cause a problem for Convex? @ericphanson, is this what you meant by
Is there a best practice to avoid this problem? Iām not clear on what it means to have the objective function be of the same order of magnitude as a constraintā¦ Second, any idea what is causing the ALMOST_OPTIMAL instead of OPTIMAL in the solution?
If I get time I may try @cgeoga 's suggestion and solve for the Cholesky form, and see if that has these same issues (but since this is now working, that is unlikely to happen soon).
Thanks for all of the help / suggestions! Here is the MWE if you want to reproduce the INFEASIBLE issue when scaling up the counts.
using Convex
using SCS
using Distributions
"""
N is number of samples in each measurement.
Below ~100, estimate degrades from noise, but at 100000
the solver breaks, says solution is INFEASIBLE...
Between those points it finds a solution that is ALMOST_OPTIMAL
"""
N = 1000
mh = [-1 0;0 1];
mq = [1im 0; 0 1];
R(Īø)=[cos(Īø) -sin(Īø); sin(Īø) cos(Īø)];
MH(Īø)= R(Īø) * mh * R(-Īø)
MQ(Īø)= R(Īø) * mq * R(-Īø)
UL(Īø) = MH(Īø/2) * MQ(Īø)
UC = MH(0) * MQ(-Ļ/4)
Uv = [UL(0), UL(Ļ/4), UC]
Ļ0 = [0.5 0 0 0.5;
0 0 0 0;
0 0 0 0;
0.5 0 0 0.5]
function combinematrices(U1, U2)
M = zeros(Complex, 4, 4);
for i in 1:2, j in 1:2
ij = 2 * (i-1) + j
for k in 1:2, l in 1:2
kl = 2 * (k-1) + l
M[kl, ij] = U1[k, i] * U2[l,j]
end
end
return M
end
""" Generate fake data """
K = Int[]
V = Vector{ComplexF64}[]
# generate samples
for U1 in Uv
for U2 in Uv
U12 = combinematrices(U1, U2)
Ļ1 = U12 * Ļ0 * (adjoint(U12))
pr = abs.(diag(Ļ1))
d = Multinomial(N, pr)
k = rand(d)
append!(K, k)
for i in 1:4
v = conj.(U12[i,:]) # this is already a column vector, so don't take adjoint, just conjugate
push!(V, v)
end
end
end
""" Define and solve optimization problem """
Ļ = HermitianSemidefinite(4)
p0 = [real(quadform(v, Ļ)) for v in V]
obj = sum(K[i] * log(p0[i]) for i in 1:length(V))
problem = maximize(obj)
problem.constraints += [real(Ļ[i,i]) >= 0 for i in 1:4]
problem.constraints += tr(Ļ) == 1.0;
solve!(problem, SCS.Optimizer)
Ļmle = evaluate(Ļ)
Edit: If you normalize the K vector, the ALMOST_OPTIMAL goes away in this example. So it appears all of my issues have to do with scaling.