Minimisation of operator norm—solution does not match evaluated value [Convex.jl]

I’m using Convex.jl to minimize the operator norm of a matrix (minimize opnorm(J,Inf)), but the solution (problem.optval) does not match the evaluated value after performing the optimisation (opnorm(evaluate(J,Inf))).

A simplified example follows below. The matrix takes a specific form (a Choi matrix of the difference of two quantum channels), and would depend on the problem at hand, but the randomly-generated matrices demonstrates the issue.

using Convex, SCS, LinearAlgebra

function opt_norm_issue(U,K;d₁,d₂)
    ρ = Semidefinite(d₂)
    J = sum(kron(
            ((1:d₁).==j)*((1:d₁).==k)',
            partialtrace(
                U'*(
                    K*kron(((1:d₁).==j)*((1:d₁).==k)', I(d₂))*K' -
                      kron(((1:d₁).==j)*((1:d₁).==k)', ρ)
                )*U
            , 2, [d₁,d₂])
        ) for j ∈ 1:d₁, k ∈ 1:d₁)

    constraints = [
        tr(ρ) == 1
    ]

    optimizer = SCS.Optimizer()
    Convex.MOI.set(optimizer, Convex.MOI.RawOptimizerAttribute("eps_abs"), 1E-9)
    Convex.MOI.set(optimizer, Convex.MOI.RawOptimizerAttribute("eps_rel"), 1E-12)

    problem = minimize(opnorm(J,Inf), constraints)
    solve!(problem, optimizer)

    return problem.optval, opnorm(evaluate(J), Inf)
end

Running the following lines a few times with different values of d₁ and d₂ show that problem.optval and opnorm(evaluate(J),Inf) do not match, and can be quite different for some K and U.

d₁,d₂ = 2,2;
H = rand(ComplexF64,d₁*d₂,d₁*d₂);
U = exp(im*π*(H+H'));
K = rand(ComplexF64,d₁*d₂,d₁*d₂);
K *= d₁/tr(K'K);

opt_norm_issue(U,K;d₁,d₂)

Any ideas why this is happening? It’s particularly problematic because I need the arguments of the optimal result, but the arguments do not actually give the (apparent) optimal value.

1 Like

Hi there, welcome to the forum.

This is an interesting problem. For others reading, this is what I get:

julia> using Convex, SCS, LinearAlgebra

julia> const MOI = Convex.MOI
MathOptInterface

julia> function opt_norm_issue(U, K; d₁, d₂)
           ρ = Semidefinite(d₂)
           J = sum(
               kron(
                   ((1:d₁).==j)*((1:d₁).==k)',
                   partialtrace(
                       U'*(
                           K*kron(((1:d₁).==j)*((1:d₁).==k)', I(d₂))*K' -
                             kron(((1:d₁).==j)*((1:d₁).==k)', ρ)
                       )*U
                   , 2, [d₁,d₂])
               )
               for j in 1:d₁, k in 1:d₁
           )
           constraints = [LinearAlgebra.tr(ρ) == 1]
           solver = SCS.Optimizer()
           MOI.set(solver, MOI.RawOptimizerAttribute("eps_abs"), 1E-9)
           MOI.set(solver, MOI.RawOptimizerAttribute("eps_rel"), 1E-12)
           problem = minimize(opnorm(J, Inf), constraints)
           solve!(problem, solver)
           return problem.optval, opnorm(evaluate(J), Inf)
       end
opt_norm_issue (generic function with 1 method)

julia> d₁, d₂ = 2, 2;

julia> H = rand(ComplexF64, d₁ * d₂, d₁ * d₂);

julia> U = exp(im * π * (H + H'));

julia> K = rand(ComplexF64, d₁ * d₂, d₁ * d₂);

julia> K *= d₁ / LinearAlgebra.tr(K' * K);

julia> opt_norm_issue(U, K; d₁, d₂)
------------------------------------------------------------------
	       SCS v3.2.3 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 38, constraints m: 74
cones: 	  z: primal zero / dual free vars: 3
	  l: linear vars: 20
	  q: soc vars: 48, qsize: 16
	  s: psd vars: 3, ssize: 1
settings: eps_abs: 1.0e-09, eps_rel: 1.0e-12, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, rho_x: 1.00e-06
	  acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-direct-amd-qdldl
	  nnz(A): 205, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 1.12e+01  1.00e+00  5.25e+01 -2.59e+01  1.00e-01  2.82e-04 
   250| 2.10e-04  6.08e-04  2.61e-04  8.23e-01  5.19e-01  1.36e-03 
   450| 4.00e-11  2.08e-13  4.01e-11  8.23e-01  5.19e-01  2.05e-03 
------------------------------------------------------------------
status:  solved
timings: total: 2.06e-03s = setup: 2.17e-04s + solve: 1.84e-03s
	 lin-sys: 8.52e-04s, cones: 5.85e-04s, accel: 6.00e-05s
------------------------------------------------------------------
objective = 0.823033
------------------------------------------------------------------
(0.8230325137470799, 0.8002661860842999)

We’d expect that the final two numbers are identical.

I assume that this is happening because the reformulation from the original problem into a conic form that SCS can solve introduces additional variables and constraints that allow some tolerance issues, but I wouldn’t have expected the two values to be so different. Although because it’s a matrix norm, small changes in the variable values can cause larger changes in the output.

@ericphanson might have some insight into what’s going on.

3 Likes

Thanks for checking out the issue.

I’ve since tried simplifying the optimisation to the following minimisation to isolate where the issue stems from:

\begin{aligned} \min_{t,\rho} \;&\; t,\\ \text{for} \;&\; J := \sum_{j,k} E_{jk} \otimes \left\{ U^\dagger \left[ K\left(E_{jk} \otimes I\right)K^\dagger - E_{jk} \otimes \rho \right] U\right\} \\ \text{subject to}\;&\;\operatorname{tr}\rho = 1,\\ &\; \rho \succeq 0, \\ &\; tI - J \succeq 0, \\ &\; tI + J \succeq 0. \end{aligned}

Here E_{jk} is the matrix with 1 on the j-th row, k-th column, and zero everywhere else; I is the identity matrix. This explicitly minimises the spectral norm of J. The spectral norm is unitarily invariant, so it should give the same result no matter what U is.

My findings are

  • JuMP gives consistent results using the same optimiser and tolerances, so it’s a problem with Convex.jl
  • The issue only occurs when both U and K are complex. It doesn’t happen when either of them is purely real or purely imaginary

Here’s my code:

using Convex, SCS, LinearAlgebra, JuMP

function opt_Convex(U,K;d₁,d₂)
    ρ = Semidefinite(d₂)
    t = Variable()
    J = sum(kron(
            ((1:d₁).==j)*((1:d₁).==k)',
            U'*(
                  K*kron(((1:d₁).==j)*((1:d₁).==k)', I(d₂))*K' -
                    kron(((1:d₁).==j)*((1:d₁).==k)', ρ)
             )*U
        ) for j ∈ 1:d₁, k ∈ 1:d₁)

    constraints = [
        tr(ρ) == 1,
        t*I(d₁^2*d₂)-J ⪰ 0,
        t*I(d₁^2*d₂)+J ⪰ 0,
    ]

    optimizer = SCS.Optimizer()
    Convex.MOI.set(optimizer, Convex.MOI.RawOptimizerAttribute("eps_abs"), 1E-9)
    Convex.MOI.set(optimizer, Convex.MOI.RawOptimizerAttribute("eps_rel"), 1E-12)

    problem = minimize(t, constraints)
    solve!(problem, optimizer)

    return problem.optval, opnorm(evaluate(J), 2)
end

function opt_JuMP(U,K;d₁,d₂)
    model = Model(SCS.Optimizer)
    @variable(model, ρ[1:d₂,1:d₂] ∈ HermitianPSDCone())
    @variable(model, t)
    J = sum(kron(
            ((1:d₁).==j)*((1:d₁).==k)',
            U'*(
                  K*kron(((1:d₁).==j)*((1:d₁).==k)', I(d₂))*K' -
                    kron(((1:d₁).==j)*((1:d₁).==k)', ρ)
             )*U
        ) for j ∈ 1:d₁, k ∈ 1:d₁)

    @constraint(model, tr(ρ)==1)
    @constraint(model, Hermitian(t*I(d₁^2*d₂)-J) ∈ HermitianPSDCone())
    @constraint(model, Hermitian(t*I(d₁^2*d₂)+J) ∈ HermitianPSDCone())

    MOI.set(model, MOI.RawOptimizerAttribute("eps_abs"), 1E-9)
    MOI.set(model, MOI.RawOptimizerAttribute("eps_rel"), 1E-12)

    @objective(model, Min, t)
    optimize!(model)

    return objective_value(model), opnorm(value.(J), 2)
end

d₁,d₂ = 2,2;

# K[1] is purely real, K[2] is purely imaginary, K[3] is complex
K = Any[reshape(1:(d₁*d₂)^2,d₁*d₂,d₁*d₂) + im*reshape((d₁*d₂)^2+1:2*(d₁*d₂)^2,d₁*d₂,d₁*d₂) for _ ∈ 1:3];
K[3] *= sqrt(d₁/tr(K[3]'K[3]));
K[1] = real.(K[3]);
K[1] *= sqrt(d₁/tr(K[1]'K[1]));
K[2] = K[1]*im;

# The unitaries are identities with complex phases
# U[1] is purely real, U[2] is purely imaginary, U[3] is complex
U = Any[I(d₁*d₂) for _ ∈ 1:3];
U[2] = U[1]*im;
U[3] = U[1]*exp(im*π/4);

# Here you’ll find that Δarr[:,:,:,2] (JuMP) is consistent
# between the optimal and evaluated values, while there is
# a discrepancy in Δarr[:,:,:,1] (Convex.jl), but only at
# Δarr[3,3,:,1] when both U and K are complex
Δarr = zeros(3,3,2,2);
for j ∈ 1:3, k ∈ 1:3
    Δarr[j,k,:,1] .= opt_Convex(U[j],K[k];d₁,d₂);
    Δarr[j,k,:,2] .= opt_JuMP(U[j],K[k];d₁,d₂);
end

With the output

julia> Δarr
3×3×2×2 Array{Float64, 4}:
[:, :, 1, 1] =
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44032

[:, :, 2, 1] =
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44759

[:, :, 1, 2] =
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633

[:, :, 2, 2] =
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633

Each column should be consistent as they are optimised over the same K. The only discrepancy happens for Convex.jl ([:,:,1,1] and [:,:,2,1]) with both U and K complex.

Maybe it has to do with some routine that separates the real and imaginary parts when Convex.jl preprocesses the variables before sending it to the optimiser?

1 Like

You could open an issue at Issues · jump-dev/Convex.jl · GitHub to report this, but be aware that Convex.jl is currently in limited maintenance mode (GitHub - jump-dev/Convex.jl: A Julia package for disciplined convex programming), so it’s unlikely to get fixed unless you dig in and find the problem. (I’m happy to review a PR, but I’m a novice when it comes to the internals of Convex, so I don’t even know where you should start to look :frowning_face:)

1 Like

Ah, that’s a pity, I’ve enjoyed using Convex very much. Guess I’d stick to using JuMP for now, and maybe dig a little deeper into Convex when I’m procrastinating on my own thesis :sweat_smile:. Thanks anyway!

Please do file an issue! As Oscar said, the package is indeed in need of a maintainer, but having issues with open bugs helps a lot (…sometimes I find time to contribute a bit still). This seems like a serious issue (a correctness bug) so it is good it is tracked.

On the branch More MOIified implementation, again by ericphanson · Pull Request #504 · jump-dev/Convex.jl · GitHub, I get

In [23]: Δarr
Out[23]: 3×3×2×2 Array{Float64, 4}:
[:, :, 1, 1] =
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44642

[:, :, 2, 1] =
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44642
 1.52194  1.52194  1.44642

[:, :, 1, 2] =
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633

[:, :, 2, 2] =
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633
 1.52194  1.52194  1.44633

I’ll admit I haven’t read closely enough to understand if this is good or not :sweat_smile:.

Submitted a bug report with a simpler optimisation that still exhibits the behaviour :slight_smile:

Oddly enough, I checked out the eph/moi2-again branch and got an error when trying to take a tensor product:

julia> kron(I(2),Semidefinite(2))
ERROR: MethodError: no method matching Matrix{Bool}(::Vector{Bool})
Closest candidates are:
  Array{T, N}(::AbstractArray{S, N}) where {T, N, S} at array.jl:540
  Matrix{T}(::LinearAlgebra.LQPackedQ) where T at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/lq.jl:154
  Matrix{T}(::LinearAlgebra.HessenbergQ{var"#s814", var"#s8141", var"#s813", false} where {var"#s814", var"#s8141"<:StridedMatrix{var"#s814"}, var"#s813"<:StridedVector{var"#s814"}}) where T at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/hessenberg.jl:467
  ...
Stacktrace:
 [1] convert(#unused#::Type{Matrix{Bool}}, a::Vector{Bool})
   @ Base ./array.jl:532
 [2] Convex.Constant(x::Vector{Bool}, sign::Positive)
   @ Convex ~/Desktop/Convex.jl/src/constant.jl:41
 [3] Convex.Constant(x::Vector{Bool}, check_sign::Bool) (repeats 2 times)
   @ Convex ~/Desktop/Convex.jl/src/constant.jl:44
 [4] constant(x::Bool)
   @ Convex ~/Desktop/Convex.jl/src/constant.jl:99
 [5] *(x::Bool, y::Variable)
   @ Convex ~/Desktop/Convex.jl/src/atoms/affine/multiply_divide.jl:167
 [6] kron(a::Diagonal{Bool, Vector{Bool}}, b::Variable)
   @ Convex ~/Desktop/Convex.jl/src/atoms/affine/kron.jl:7
 [7] top-level scope
   @ none:1

While kron(Semidefinite(2),I(2)) worked. Even so, tr(Semidefinite(2)) == 1 gave a similar error, so I wasn’t able to confirm what you found.

2 Likes