Bottleneck of JuMP

Try with:

function dual_optimizer()
    inner = MosekTools.Optimizer()
    dual_problem = Dualization.DualProblem{Float64}(inner)
    return Dualization.DualOptimizer{Float64,typeof(inner)}(dual_problem)
end

model = Model(dual_optimizer)

The underlying issue is:

  1. Mosek does not support HermitianPSDCone directly, so JuMP reformulates the problem into a larger PSD model.
  2. Model(Dualization.dual_optimizer(Mosek.Optimizer)) creates two sets of bridges: one before the dual_optimizer and one between dualization and Mosek.
  3. Because there is a bridge after dualization, we dualise the Hermitian matrices, rather than their PSD equivalent.
  4. The dualized Hermitian matrices then get bridged back into the unfavourable form by JuMP.
  5. The solution is the remove the bridges between the dual layer and Mosek. Then the Hermitian matrices will be reformulated first, and then dualized second.
  6. Dualization doesn’t have an easy way to achieve this at the moment, which is why we need the ungainly Dualization.DualProblem. I’ll open an issue to fix this.
  7. I think this is fundamentally a design issue with Mosek: it should internally decide whether to solve the primal or the dual. It shouldn’t force users to decide how to write their model to satisfy it.
julia> using LinearAlgebra

julia> using SparseArrays

julia> using Random

julia> using JuMP

julia> using MosekTools

julia> using Dualization

julia> function minimal_psd_test(; use_dual=false, seed=1234)
           Random.seed!(seed)

           # Rough toy sizes only
           NQ   = 12          # number of q sectors
           nred = 8           # size of the two small Hermitian blocks
           nbig = 18          # size of the one larger Hermitian block
           t2n  = 300         # T2 matrix size
           nzpr = 6           # sparse-map entries per row

           # Total length of vectorized D variables
           nd = NQ*nred^2 + NQ*nred^2 + NQ*nbig^2

           # Random sparse map M : Dvec -> T2vec
           rows = Int[]
           cols = Int[]
           vals = Float64[]
           for r in 1:(t2n*t2n)
               js = rand(1:nd, nzpr)
               vs = randn(nzpr)
               for k in 1:nzpr
                   push!(rows, r)
                   push!(cols, js[k])
                   push!(vals, vs[k])
               end
           end
           M = sparse(rows, cols, vals, t2n*t2n, nd)

           # Random objective coefficients
           c = randn(nd)

           function dual_optimizer()
               inner = MosekTools.Optimizer()
               dual_problem = Dualization.DualProblem{Float64}(inner)
               return Dualization.DualOptimizer{Float64,typeof(inner)}(dual_problem)
           end
           if use_dual
               model = Model(dual_optimizer)
           else
               model = Model(Mosek.Optimizer)
           end

           set_string_names_on_creation(model, false)
           set_attribute(model, "MSK_IPAR_LOG", 10)

           # Analog of your Dmatrix
           Dmatrix = Vector{Any}(undef, 3)
           Dmatrix[1] = Vector{Any}(undef, NQ)
           Dmatrix[2] = Vector{Any}(undef, NQ)
           Dmatrix[3] = Vector{Any}(undef, NQ)

           for q in 1:NQ
               Dmatrix[1][q] = @variable(model, [1:nred, 1:nred], Hermitian)
               Dmatrix[2][q] = @variable(model, [1:nred, 1:nred], Hermitian)
               Dmatrix[3][q] = @variable(model, [1:nbig, 1:nbig], Hermitian)

               @constraint(model, Hermitian(Dmatrix[1][q]) in HermitianPSDCone())
               @constraint(model, Hermitian(Dmatrix[2][q]) in HermitianPSDCone())
               @constraint(model, Hermitian(Dmatrix[3][q]) in HermitianPSDCone())
           end

           # Vectorize in the same spirit as your real code
           Dvec = vcat(
               [vec(Dmatrix[1][q]) for q in 1:NQ]...,
               [vec(Dmatrix[2][q]) for q in 1:NQ]...,
               [vec(Dmatrix[3][q]) for q in 1:NQ]...
           )

           # Sparse affine map -> large matrix
           T2vec = M * Dvec
           T2raw = reshape(T2vec, t2n, t2n)
           T2herm = 0.5 * (T2raw + T2raw')

           @constraint(model, Hermitian(T2herm) in HermitianPSDCone())

           @objective(model, Min, sum(c[i] * real(Dvec[i]) for i in 1:nd))

         

           optimize!(model)

           return model
       end
minimal_psd_test (generic function with 1 method)

julia> minimal_psd_test(; use_dual = true)
Problem
  Name                   :                 
  Objective sense        : maximize        
  Type                   : CONIC (conic optimization problem)
  Constraints            : 5424            
  Affine conic cons.     : 0               
  Disjunctive cons.      : 0               
  Cones                  : 0               
  Scalar variables       : 0               
  Matrix variables       : 37 (scalarized: 191556)
  Integer variables      : 0               

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00            
Lin. dep.  - tries                  : 1                 time                   : 0.00            
Lin. dep.  - primal attempts        : 1                 successes              : 1               
Lin. dep.  - dual attempts          : 0                 successes              : 0               
Lin. dep.  - primal deps.           : 0                 dual deps.             : 0               
Presolve terminated. Time: 0.00    
Optimizer  - threads                : 10              
Optimizer  - solved problem         : the primal      
Optimizer  - Constraints            : 5424            
Optimizer  - Cones                  : 0               
Optimizer  - Scalar variables       : 0                 conic                  : 0               
Optimizer  - Semi-definite variables: 37                scalarized             : 191556          
Factor     - setup time             : 0.52            
Factor     - dense det. time        : 0.00              GP order time          : 0.00            
Factor     - nonzeros before factor : 1.47e+07          after factor           : 1.47e+07        
Factor     - dense dim.             : 0                 flops                  : 9.02e+11        
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   4.2e+00  1.0e+00  1.0e+00  0.00e+00   -0.000000000e+00  -0.000000000e+00  1.0e+00  1.15  
1   1.7e-01  4.0e-02  1.1e-01  -9.10e-01  -0.000000000e+00  -6.675589379e+00  4.0e-02  4.82  
2   1.5e-04  3.6e-05  1.1e-06  9.53e-01   -0.000000000e+00  -6.099937757e-04  3.6e-05  8.54  
3   9.8e-15  6.5e-19  1.1e-26  1.00e+00   -0.000000000e+00  -2.696191184e-16  6.4e-19  12.25 
Optimizer terminated. Time: 12.26   

A JuMP Model
├ solver: Dual model with Mosek attached
├ objective_sense: MIN_SENSE
│ └ objective_function_type: AffExpr
├ num_variables: 5424
├ num_constraints: 37
│ └ Vector{AffExpr} in MOI.HermitianPositiveSemidefiniteConeTriangle: 37
└ Names registered in the model: none
2 Likes

I’d like to listen about how to write memory-efficient code with JuMP. Or better: read a JuMP’s performance tip document page about this.

For larger models, building a JuMP model can take quite a few time and memory (indicating that there are much space allocated in julia, so less memory budget for the external solver), and building JuMP models doesn’t parallelize well across multiple threads, compared to solution algorithms executed by solvers. (e.g. solving multiple MIPs can be efficiently parallelized.). So I think fairly speaking this can be seen as a “bottleneck due to JuMP”.

This post has nothing to do with JuMP or Julia. It is purely an issue with the performance inside Mosek. How can you formulate the problem matters.

Thank!
I wonder whether there is a rule of thumb that tells me in these kind of problems when I should use dual or when I shouldn’t use dual? Does it have something to do with the matrix size?

See this section of the Mosek docs: 7 Practical optimization — MOSEK Modeling Cookbook 3.4.0