Thanks for your help, Oscar!
Let me begin by describing the program we are solving. Given n-by-n positive semidefinite matrices A and W and a set of constrained off-diagonal entries \Omega \subseteq \{ (i,j) : 1 \leq i < j \leq n \}, we seek to solve the following:
\min_{ \substack{ X \succeq 0} \\ X(i,j) = -A(i,j) \ \forall (i,j) \in \Omega } \textrm{tr}(X W) + \gamma \cdot \| A + X \|_F^2
Here’s a MWE which creates a random problem instance and uses JuMP to create a model and pass to a solver. My apologies that it is not-so-minimal. Our problem size of interest has n=2000
but here I used n=150
for speed.
using JuMP
using SCS, MosekTools #, Clarabel
using LinearAlgebra
using StatsBase
using Random
using BenchmarkTools
function create_rand_psd(n)
Y = Y = randn(n,n)
A = Y * Y'
return A / tr(A)
end
function create_constrained_entries(n)
# sample random 10 % of off-diagonal entries to be constrained
all_off_diag = all_off_diag = [(i,j) for i=1:n, j=1:n if i < j]
num_constrained = ceil(Int, 0.1 * length(all_off_diag))
constrained_entries = sample(all_off_diag, num_constrained, replace=false)
return constrained_entries
end
function mwe_program(A, W, constrained_entries, fro_penalty, solver)
# get dimensions
m = size(A,1)
@assert fro_penalty > 0
# create the program
model = Model(optimizer_with_attributes(solver.Optimizer))
# flatten and filter unobserved pairs
num_ce = length(constrained_entries)
# construct the scalar variables x_{i,j} which are free
num_vars = div(m*(m+1), 2) - num_ce
@variable(model, x[1:num_vars])
# create the expression for the objective function
obj_ex = zero(QuadExpr)
# construct the symmetric affine JuMP expression X -- filling in only upper triangular
X = zeros(AffExpr,m,m)
ind = 1
for i=1:m
for j=i:m
if (i,j) in constrained_entries
X[i,j] += - A[i,j]
else
# set the element in X
X[i,j] += x[ind]
# update the objective function
count_coeff = (i == j) ? 1.0 : 2.0
if W[i,j] !== 0.0
add_to_expression!(obj_ex, count_coeff * W[i,j], x[ind]) # linear part
end
add_to_expression!(obj_ex, count_coeff * fro_penalty, (x[ind] + A[i,j] )^2 ) # frobenius norm
# update index
ind += 1
end
end
end
# constraint that X is psd
X = Symmetric(X)
@constraint(model, X in PSDCone())
# minimize the model
@objective(model, Min, obj_ex)
# run the optimization
optimize!(model)
# construct the upper bound matrix
return value.(X)
end
# set seed
Random.seed!(123)
# create problem instance
println("\nCreating the problem instance...")
n = 150
A = create_rand_psd(n)
W = create_rand_psd(n)
constrained_entries = create_constrained_entries(n)
fro_penalty = 1.0
# run the instance -- 775.830 ms (833897 allocations: 63.13 MiB)
println("\nBenchmarking the JuMP program with SCS...")
@btime X = mwe_program(A, W, constrained_entries, fro_penalty, SCS)
# run the instance -- 45.985 s (934494 allocations: 72.71 MiB)
println("\nBenchmarking the JuMP program with Mosek...")
@btime X = mwe_program(A, W, constrained_entries, fro_penalty, Mosek)
# run the instance -- doesn't complete a first iteration within 2 minutes
println("\nBenchmarking the JuMP program with Mosek...")
@btime X = mwe_program(A, W, constrained_entries, fro_penalty, Clarabel)
Any help or advice in reformulating the model would be greatly appreciated!
Try using MosekTools.jl (commercial) or Clarabel.jl (open source) instead.
Thanks for the advice. Could you elaborate more here on why MosekTools.jl or Clarabel.jl would be preferable? I did a quick benchmark on my laptop (in the MWE) which tests MosekTools.jl and Clarabel.jl against SCS.jl. Although it’s a much smaller problem size, it seems that SCS.jl performs better than both of these. In fact, Clarabel.jl does not complete the first iteration after a few minutes at n=150
, though it does complete in 6 seconds for n=50
. I haven’t yet run a test on a larger size though.
Not really, and I don’t know if there would be much benefit. The algorithms are iterative, so they don’t parallelize very well.
Thanks for the advice here. It makes sense that paralleleization may not be beneficial because the algorithms are iterative. But, I would have guessed that the underlying linear systems solves would be amenable to parallelization. Do you mind explaining a bit more why this isn’t the case?