Minimisation of operator norm—solution does not match evaluated value [Convex.jl]

I’m using Convex.jl to minimize the operator norm of a matrix (minimize opnorm(J,Inf)), but the solution (problem.optval) does not match the evaluated value after performing the optimisation (opnorm(evaluate(J,Inf))).

A simplified example follows below. The matrix takes a specific form (a Choi matrix of the difference of two quantum channels), and would depend on the problem at hand, but the randomly-generated matrices demonstrates the issue.

using Convex, SCS, LinearAlgebra

function opt_norm_issue(U,K;d₁,d₂)
    ρ = Semidefinite(d₂)
    J = sum(kron(
            ((1:d₁).==j)*((1:d₁).==k)',
            partialtrace(
                U'*(
                    K*kron(((1:d₁).==j)*((1:d₁).==k)', I(d₂))*K' -
                      kron(((1:d₁).==j)*((1:d₁).==k)', ρ)
                )*U
            , 2, [d₁,d₂])
        ) for j ∈ 1:d₁, k ∈ 1:d₁)

    constraints = [
        tr(ρ) == 1
    ]

    optimizer = SCS.Optimizer()
    Convex.MOI.set(optimizer, Convex.MOI.RawOptimizerAttribute("eps_abs"), 1E-9)
    Convex.MOI.set(optimizer, Convex.MOI.RawOptimizerAttribute("eps_rel"), 1E-12)

    problem = minimize(opnorm(J,Inf), constraints)
    solve!(problem, optimizer)

    return problem.optval, opnorm(evaluate(J), Inf)
end

Running the following lines a few times with different values of d₁ and d₂ show that problem.optval and opnorm(evaluate(J),Inf) do not match, and can be quite different for some K and U.

d₁,d₂ = 2,2;
H = rand(ComplexF64,d₁*d₂,d₁*d₂);
U = exp(im*π*(H+H'));
K = rand(ComplexF64,d₁*d₂,d₁*d₂);
K *= d₁/tr(K'K);

opt_norm_issue(U,K;d₁,d₂)

Any ideas why this is happening? It’s particularly problematic because I need the arguments of the optimal result, but the arguments do not actually give the (apparent) optimal value.

Hi there, welcome to the forum.

This is an interesting problem. For others reading, this is what I get:

julia> using Convex, SCS, LinearAlgebra

julia> const MOI = Convex.MOI
MathOptInterface

julia> function opt_norm_issue(U, K; d₁, d₂)
           ρ = Semidefinite(d₂)
           J = sum(
               kron(
                   ((1:d₁).==j)*((1:d₁).==k)',
                   partialtrace(
                       U'*(
                           K*kron(((1:d₁).==j)*((1:d₁).==k)', I(d₂))*K' -
                             kron(((1:d₁).==j)*((1:d₁).==k)', ρ)
                       )*U
                   , 2, [d₁,d₂])
               )
               for j in 1:d₁, k in 1:d₁
           )
           constraints = [LinearAlgebra.tr(ρ) == 1]
           solver = SCS.Optimizer()
           MOI.set(solver, MOI.RawOptimizerAttribute("eps_abs"), 1E-9)
           MOI.set(solver, MOI.RawOptimizerAttribute("eps_rel"), 1E-12)
           problem = minimize(opnorm(J, Inf), constraints)
           solve!(problem, solver)
           return problem.optval, opnorm(evaluate(J), Inf)
       end
opt_norm_issue (generic function with 1 method)

julia> d₁, d₂ = 2, 2;

julia> H = rand(ComplexF64, d₁ * d₂, d₁ * d₂);

julia> U = exp(im * π * (H + H'));

julia> K = rand(ComplexF64, d₁ * d₂, d₁ * d₂);

julia> K *= d₁ / LinearAlgebra.tr(K' * K);

julia> opt_norm_issue(U, K; d₁, d₂)
------------------------------------------------------------------
	       SCS v3.2.3 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 38, constraints m: 74
cones: 	  z: primal zero / dual free vars: 3
	  l: linear vars: 20
	  q: soc vars: 48, qsize: 16
	  s: psd vars: 3, ssize: 1
settings: eps_abs: 1.0e-09, eps_rel: 1.0e-12, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, rho_x: 1.00e-06
	  acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-direct-amd-qdldl
	  nnz(A): 205, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 1.12e+01  1.00e+00  5.25e+01 -2.59e+01  1.00e-01  2.82e-04 
   250| 2.10e-04  6.08e-04  2.61e-04  8.23e-01  5.19e-01  1.36e-03 
   450| 4.00e-11  2.08e-13  4.01e-11  8.23e-01  5.19e-01  2.05e-03 
------------------------------------------------------------------
status:  solved
timings: total: 2.06e-03s = setup: 2.17e-04s + solve: 1.84e-03s
	 lin-sys: 8.52e-04s, cones: 5.85e-04s, accel: 6.00e-05s
------------------------------------------------------------------
objective = 0.823033
------------------------------------------------------------------
(0.8230325137470799, 0.8002661860842999)

We’d expect that the final two numbers are identical.

I assume that this is happening because the reformulation from the original problem into a conic form that SCS can solve introduces additional variables and constraints that allow some tolerance issues, but I wouldn’t have expected the two values to be so different. Although because it’s a matrix norm, small changes in the variable values can cause larger changes in the output.

@ericphanson might have some insight into what’s going on.

Thanks for checking out the issue.

I’ve since tried simplifying the optimisation to the following minimisation to isolate where the issue stems from:

\begin{aligned} \min_{t,\rho} \;&\; t,\\ \text{for} \;&\; J := \sum_{j,k} E_{jk} \otimes \left\{ U^\dagger \left[ K\left(E_{jk} \otimes I\right)K^\dagger - E_{jk} \otimes \rho \right] U\right\} \\ \text{subject to}\;&\;\operatorname{tr}\rho = 1,\\ &\; \rho \succeq 0, \\ &\; tI - J \succeq 0, \\ &\; tI + J \succeq 0. \end{aligned}

Here E_{jk} is the matrix with 1 on the j-th row, k-th column, and zero everywhere else; I is the identity matrix. This explicitly minimises the spectral norm of J. The spectral norm is unitarily invariant, so it should give the same result no matter what U is.

My findings are

  • JuMP gives consistent results using the same optimiser and tolerances, so it’s a problem with Convex.jl
  • The issue only occurs when both U and K are complex. It doesn’t happen when either of them is purely real or purely imaginary

Here’s my code:

using Convex, SCS, LinearAlgebra, JuMP

function opt_Convex(U,K;d₁,d₂)
    ρ = Semidefinite(d₂)
    t = Variable()
    J = sum(kron(
            ((1:d₁).==j)*((1:d₁).==k)',
            U'*(
                  K*kron(((1:d₁).==j)*((1:d₁).==k)', I(d₂))*K' -
                    kron(((1:d₁).==j)*((1:d₁).==k)', ρ)
             )*U
        ) for j ∈ 1:d₁, k ∈ 1:d₁)

    constraints = [
        tr(ρ) == 1,
        t*I(d₁^2*d₂)-J ⪰ 0,
        t*I(d₁^2*d₂)+J ⪰ 0,
    ]

    optimizer = SCS.Optimizer()
    Convex.MOI.set(optimizer, Convex.MOI.RawOptimizerAttribute("eps_abs"), 1E-9)
    Convex.MOI.set(optimizer, Convex.MOI.RawOptimizerAttribute("eps_rel"), 1E-12)

    problem = minimize(t, constraints)
    solve!(problem, optimizer)

    return problem.optval, opnorm(evaluate(J), 2)
end

function opt_JuMP(U,K;d₁,d₂)
    model = Model(SCS.Optimizer)
    @variable(model, ρ[1:d₂,1:d₂] ∈ HermitianPSDCone())
    @variable(model, t)
    J = sum(kron(
            ((1:d₁).==j)*((1:d₁).==k)',
            U'*(
                  K*kron(((1:d₁).==j)*((1:d₁).==k)', I(d₂))*K' -
                    kron(((1:d₁).==j)*((1:d₁).==k)', ρ)
             )*U
        ) for j ∈ 1:d₁, k ∈ 1:d₁)

    @constraint(model, tr(ρ)==1)
    @constraint(model, Hermitian(t*I(d₁^2*d₂)-J) ∈ HermitianPSDCone())
    @constraint(model, Hermitian(t*I(d₁^2*d₂)+J) ∈ HermitianPSDCone())

    MOI.set(model, MOI.RawOptimizerAttribute("eps_abs"), 1E-9)
    MOI.set(model, MOI.RawOptimizerAttribute("eps_rel"), 1E-12)

    @objective(model, Min, t)
    optimize!(model)

    return objective_value(model), opnorm(value.(J), 2)
end

d₁,d₂ = 2,2;

# K[1] is purely real, K[2] is purely imaginary, K[3] is complex
K = Any[reshape(1:(d₁*d₂)^2,d₁*d₂,d₁*d₂) + im*reshape((d₁*d₂)^2+1:2*(d₁*d₂)^2,d₁*d₂,d₁*d₂) for _ ∈ 1:3];
K[3] *= sqrt(d₁/tr(K[3]'K[3]));
K[1] = real.(K[3]);
K[1] *= sqrt(d₁/tr(K[1]'K[1]));
K[2] = K[1]*im;

# The unitaries are identities with complex phases
# U[1] is purely real, U[2] is purely imaginary, U[3] is complex
U = Any[I(d₁*d₂) for _ ∈ 1:3];
U[2] = U[1]*im;
U[3] = U[1]*exp(im*π/4);

# Here you’ll find that Δarr[:,:,:,2] (JuMP) is consistent
# between the optimal and evaluated values, while there is
# a discrepancy in Δarr[:,:,:,1] (Convex.jl), but only at
# Δarr[3,3,:,1] when both U and K are complex
Δarr = zeros(3,3,2,2);
for j ∈ 1:3, k ∈ 1:3
    Δarr[j,k,:,1] .= opt_Convex(U[j],K[k];d₁,d₂);
    Δarr[j,k,:,2] .= opt_JuMP(U[j],K[k];d₁,d₂);
end

With the output

julia> Δarr
3×3×2×2 Array{Float64, 4}:
[:, :, 1, 1] =
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44032

[:, :, 2, 1] =
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44759

[:, :, 1, 2] =
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633

[:, :, 2, 2] =
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633

Each column should be consistent as they are optimised over the same K. The only discrepancy happens for Convex.jl ([:,:,1,1] and [:,:,2,1]) with both U and K complex.

Maybe it has to do with some routine that separates the real and imaginary parts when Convex.jl preprocesses the variables before sending it to the optimiser?

You could open an issue at Issues · jump-dev/Convex.jl · GitHub to report this, but be aware that Convex.jl is currently in limited maintenance mode (GitHub - jump-dev/Convex.jl: A Julia package for disciplined convex programming), so it’s unlikely to get fixed unless you dig in and find the problem. (I’m happy to review a PR, but I’m a novice when it comes to the internals of Convex, so I don’t even know where you should start to look :frowning_face:)

Ah, that’s a pity, I’ve enjoyed using Convex very much. Guess I’d stick to using JuMP for now, and maybe dig a little deeper into Convex when I’m procrastinating on my own thesis :sweat_smile:. Thanks anyway!

Please do file an issue! As Oscar said, the package is indeed in need of a maintainer, but having issues with open bugs helps a lot (…sometimes I find time to contribute a bit still). This seems like a serious issue (a correctness bug) so it is good it is tracked.

On the branch More MOIified implementation, again by ericphanson · Pull Request #504 · jump-dev/Convex.jl · GitHub, I get

In [23]: Δarr
Out[23]: 3×3×2×2 Array{Float64, 4}:
[:, :, 1, 1] =
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44642

[:, :, 2, 1] =
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44642

[:, :, 1, 2] =
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633

[:, :, 2, 2] =
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633

I’ll admit I haven’t read closely enough to understand if this is good or not :sweat_smile:.

Submitted a bug report with a simpler optimisation that still exhibits the behaviour :slight_smile:

Oddly enough, I checked out the eph/moi2-again branch and got an error when trying to take a tensor product:

julia> kron(I(2),Semidefinite(2))
ERROR: MethodError: no method matching Matrix{Bool}(::Vector{Bool})
Closest candidates are:
  Array{T, N}(::AbstractArray{S, N}) where {T, N, S} at array.jl:540
  Matrix{T}(::LinearAlgebra.LQPackedQ) where T at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/lq.jl:154
  Matrix{T}(::LinearAlgebra.HessenbergQ{var"#s814", var"#s8141", var"#s813", false} where {var"#s814", var"#s8141"<:StridedMatrix{var"#s814"}, var"#s813"<:StridedVector{var"#s814"}}) where T at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/hessenberg.jl:467
  ...
Stacktrace:
 [1] convert(#unused#::Type{Matrix{Bool}}, a::Vector{Bool})
   @ Base ./array.jl:532
 [2] Convex.Constant(x::Vector{Bool}, sign::Positive)
   @ Convex ~/Desktop/Convex.jl/src/constant.jl:41
 [3] Convex.Constant(x::Vector{Bool}, check_sign::Bool) (repeats 2 times)
   @ Convex ~/Desktop/Convex.jl/src/constant.jl:44
 [4] constant(x::Bool)
   @ Convex ~/Desktop/Convex.jl/src/constant.jl:99
 [5] *(x::Bool, y::Variable)
   @ Convex ~/Desktop/Convex.jl/src/atoms/affine/multiply_divide.jl:167
 [6] kron(a::Diagonal{Bool, Vector{Bool}}, b::Variable)
   @ Convex ~/Desktop/Convex.jl/src/atoms/affine/kron.jl:7
 [7] top-level scope
   @ none:1

While kron(Semidefinite(2),I(2)) worked. Even so, tr(Semidefinite(2)) == 1 gave a similar error, so I wasn’t able to confirm what you found.