Bottleneck of JuMP

I am solving a SDP problem using JuMP and the associated MosekTools.
The basic pattern is this:`

NQ=9
Nk=9
nred=4
Dmatrix = Vector{Any}(undef, 3)
        Dmatrix[1] = Vector{Any}(undef, NQ) # uuuu channel
        Dmatrix[2] = Vector{Any}(undef, NQ) # dddd channel
        Dmatrix[3] = Vector{Any}(undef, NQ) # uddu channel
    model = Model(Mosek.Optimizer)
        #This is D constraint
        for jq in 1:NQ
            nred = length(k_pairs_dic[jq])

            Dmatrix[1][jq] = @variable(model, [1:nred, 1:nred], Hermitian)
            Dmatrix[2][jq] = @variable(model, [1:nred, 1:nred], Hermitian)
            Dmatrix[3][jq] = @variable(model, [1:Nk,   1:Nk  ], Hermitian)

                @constraint(model, Hermitian((Dmatrix[1][jq]+Dmatrix[1][jq]')/2) in HermitianPSDCone())
                @constraint(model, Hermitian((Dmatrix[2][jq]+Dmatrix[2][jq]')/2) in HermitianPSDCone())
                @constraint(model, Hermitian((Dmatrix[3][jq]+Dmatrix[3][jq]')/2) in HermitianPSDCone())
       end`

I am initializing a bunch of matrix and declaring them to be hermitian variables. I then put all the variables as a vector:
Dvec = vectorize_D(Dmatrix, ...)

Then I will construct a bunch of mapping matrix from Dvec to other variables. Unvectorize them, organize those into a matrix, and then impose constraints that those matrix to be PSD. The problem looks like this.

Amatrix=construct_map(...)

other_vec=Amatrix*Dvec

other_matrix=unvectorize(...)


@constraint(model, Hermitian(othermatrix) in HermitianPSDCone())

At the end of the day, I am imposing PSD constraints on 1 120 by 120 matrix, one 81 by 81 matrix, around 50 15 by 15 matrix. Now this has already become extremely expensive and slow to solve, and many times I get OOM error when I call

optimize!(model)

I want to point out that everything before optimize!(model) seems fine, when I construct the linear mapping matrix Amatrix or the other_vec, there were no error. Also, for reference, a problem of similar size in matlab using YAMIP and Mosek only takes less than a minute. I am not sure what is the bottle neck here. Is this workflow even correct? Or what can be limiting the speed and memory usage?

Hi @ttx002000, it’s really impossible to say without a reproducible example.

Do you get a log from Mosek before the OOM?

Have you tried a different solver like SCS.jl?

I think a minimum working example is this. So essentially I am declaring 4000 variables, and then I am creating a sparse linear map from this 4000 matrix to a 400 by 400 matrix (which is actually vectorized 400^2 variable), sparse meaning that each entry of the matrix only depends on a few of these 4000 variables. Then I ask this 400 by 400 matrix to be PSD. This piece of code is generate by chatgpt, but I think it captures the essence of the problem I am encountering. The real code has a structure very similar to this.

calling this function produces OOM on my computer. But a similar-sized problem on matlab using YALMIP/MOSEK won’t have such problem.

using SparseArrays
using LinearAlgebra
using JuMP
using MosekTools

function minimal_psd_test(;
    nvar::Int = 4000,
    nmat::Int = 400,
    entries_per_row::Int = 8,
    seed::Int = 1234,
)
    Random.seed!(seed)

    ntri = nmat * (nmat + 1) ÷ 2

    rows = Int[]
    cols = Int[]
    vals = Float64[]

    for r in 1:ntri
        chosen = rand(1:nvar, entries_per_row)
        for c in chosen
            push!(rows, r)
            push!(cols, c)
            push!(vals, randn())
        end
    end

    A = sparse(rows, cols, vals, ntri, nvar)
    b = 0.01 .* randn(ntri)
    c = randn(nvar)

    I, J, V = findnz(A)
    row_terms = [Vector{Tuple{Int,Float64}}() for _ in 1:ntri]
    for t in eachindex(I)
        push!(row_terms[I[t]], (J[t], V[t]))
    end

    # IMPORTANT: not direct_model
    model = Model(Mosek.Optimizer)
            set_attribute(model, "MSK_DPAR_INTPNT_CO_TOL_PFEAS", 1e-10)
        set_attribute(model, "MSK_DPAR_INTPNT_CO_TOL_DFEAS", 1e-10)
        set_attribute(model, "MSK_DPAR_INTPNT_CO_TOL_REL_GAP", 1e-10)
        set_attribute(model, "MSK_IPAR_LOG", 10)
        set_attribute(model, "MSK_IPAR_LOG_INTPNT", 10)
        set_attribute(model, "MSK_IPAR_LOG_STORAGE", 1)
        set_attribute(model, "MSK_IPAR_LOG_CUT_SECOND_OPT", 1)

    set_string_names_on_creation(model, false)

    @variable(model, x[1:nvar])

    y = Vector{AffExpr}(undef, ntri)
    for r in 1:ntri
        expr = AffExpr(b[r])
        for (j, v) in row_terms[r]
            add_to_expression!(expr, v, x[j])
        end
        y[r] = expr
    end

    Y = Matrix{AffExpr}(undef, nmat, nmat)
    idx = 1
    for j in 1:nmat
        for i in 1:j
            Y[i, j] = y[idx]
            Y[j, i] = y[idx]
            idx += 1
        end
    end

    @constraint(model, Symmetric(Y) in PSDCone())
    @objective(model, Min, sum(c[i] * x[i] for i in 1:nvar))

    println("Finished building model.")
    println("About to optimize...")

    optimize!(model)

    println("termination_status = ", termination_status(model))
    println("primal_status      = ", primal_status(model))
    println("objective_value    = ", objective_value(model))

    return model
end

Do you get a log from Mosek? Does it OOM with SCS?

Hi Mosek gives error 1051. And I am not sure why I should switch to SCS? Since I think that matlab also calls mosek and doesn’t have such issue?

Do you get a log from Mosek?

Mosek gives error 1051

Can you post the full print out of running Mosek? When does this error occur? What is the stack trace?

Does it OOM with SCS?

And I am not sure why I should switch to SCS?

If it errors with SCS, it means the issue is somewhere in JuMP or MathOptInterface.

If it works in SCS but not with Mosek, it means the issue is in MosekTools.jl or Mosek proper.

Now, I can’t run your code because I don’t have a Mosek license, but it works no problem with SCS, which makes me think it is only Mosek. But if it fails with SCS it might be your machine.

I also get my original code (not this toy model) to work with SCS. This is very weird. Why would mosek work for matlab but not for julia? For anyone that is interested, this is the error I get when I use mosek for this toy code:

About to optimize...
Problem
  Name                   :                 
  Objective sense        : minimize        
  Type                   : CONIC (conic optimization problem)
  Constraints            : 0               
  Affine conic cons.     : 1 (80200 rows)
  Disjunctive cons.      : 0               
  Cones                  : 0               
  Scalar variables       : 4000            
  Matrix variables       : 0               
  Integer variables      : 0               

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00            
Lin. dep.  - tries                  : 1                 time                   : 0.02            
Lin. dep.  - primal attempts        : 1                 successes              : 1               
Lin. dep.  - dual attempts          : 0                 successes              : 0               
Lin. dep.  - primal deps.           : 0                 dual deps.             : 0               
Presolve terminated. Time: 0.09    
Optimizer terminated. Time: 72.01   

Mosek.MosekError(1051, "")

Stacktrace:
  [1] optimize
    @ C:\Users\10519\.julia\packages\Mosek\Gzljt\src\msk_functions.jl:2399 [inlined]
  [2] optimize!(m::MosekTools.Optimizer)
    @ MosekTools C:\Users\10519\.julia\packages\MosekTools\hgngi\src\MosekTools.jl:356
  [3] optimize!
    @ C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\Bridges\bridge_optimizer.jl:367 [inlined]
  [4] optimize!
    @ C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\MathOptInterface.jl:122 [inlined]
  [5] optimize!(m::MathOptInterface.Utilities.CachingOptimizer{…})
    @ MathOptInterface.Utilities C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\Utilities\cachingoptimizer.jl:370
  [6] optimize!(model::Model; ignore_optimize_hook::Bool, _differentiation_backend::MathOptInterface.Nonlinear.SparseReverseMode, kwargs::@Kwargs{})
    @ JuMP C:\Users\10519\.julia\packages\JuMP\0tD10\src\optimizer_interface.jl:610
  [7] optimize!
    @ C:\Users\10519\.julia\packages\JuMP\0tD10\src\optimizer_interface.jl:560 [inlined]
  [8] minimal_psd_test(; nvar::Int64, nmat::Int64, entries_per_row::Int64, seed::Int64)
    @ Main .\In[59]:79
  [9] minimal_psd_test()
    @ Main .\In[59]:7
 [10] top-level scope
    @ In[60]:1
Some type information was truncated. Use `show(err)` to see complete types.

Since you’re using MOSEK and the same type of model seems to work in MATLAB, it might also be worth sending a minimal reproducible example to MOSEK support.

If a commercial solver is your main path, official technical support is really part of its value.

The Julia community can help narrow down whether the issue lies in JuMP / MosekTools, while MOSEK support would be in a better position to confirm whether this is a solver-side issue.

1 Like

Here’s a related reply, suggesting you to try smaller scaled problems.

From personal experience, the input data itself occupies some memory, the JuMP model occupies some memory, and when the solver starts to run, it will again use some memory.

If you just want to do some optimization, you can consider e.g. free the memory of primitive data after the JuMP model has been constructed, e.g. calling GC.gc(). (Although I’m not sure if julia can really return that exact amount of memory to the OS.)

This is not where your problem is, but this code makes no sense. You can do simply

            Dmatrix[1][jq] = @variable(model, [1:nred, 1:nred] in HermitianPSDCone())
            Dmatrix[2][jq] = @variable(model, [1:nred, 1:nred] in HermitianPSDCone())
            Dmatrix[3][jq] = @variable(model, [1:Nk,   1:Nk  ] in HermitianPSDCone())
1 Like

Why would mosek work for matlab but not for Julia?

Have you verified that they are exactly the same model? What’s the log output when you run the MATLAB model?

This doesn’t seem to be a JuMP or Julia issue. We’ve formulated the model and passed it to Mosek, and the error looks like it’s an OOM issue somewhere in their solve routine.

You should reach out to Mosek. You can send them the offending file with JuMP.write_to_file(model, "model.cbf")

It’s also unrelated to the OOM issue, but I’d write your code as:

using JuMP
import Random
import SCS
import SparseArrays
function minimal_psd_test(;
    nvar::Int = 4000,
    nmat::Int = 400,
    entries_per_row::Int = 8,
    seed::Int = 1234,
    optimizer::Any = SCS.Optimizer,
)
    Random.seed!(seed)
    ntri = nmat * (nmat + 1) ÷ 2
    rows = repeat(1:ntri; inner = entries_per_row)
    cols = rand(1:nvar, ntri * entries_per_row)
    vals = randn(ntri * entries_per_row)
    A = SparseArrays.sparse(rows, cols, vals, ntri, nvar)
    b = 0.01 .* randn(ntri)
    c = randn(nvar)
    model = Model(optimizer)
    @variable(model, x[1:nvar])
    @expression(model, y, A * x + b)
    Y = reshape_vector(y, SymmetricMatrixShape(nmat))
    @constraint(model, Y in PSDCone())
    @objective(model, Min, c' * x)
    optimize!(model)
    display(solution_summary(model))
    return model
end
minimal_psd_test()

I don’t think it is MOSEK’s issue. Because here is a code in matlab that does this using MOSEK, and it runs fine. Asking chatgpt a bit more, he was suggesting it’s how JuMP is doing dualization. Do you have an idea of the work around?
:

function [sol, xval, objval, timing] = minimal_psd_test_matlab(varargin)

    total_tic = tic;
    fprintf('Start function\n');

    fprintf('Parsing inputs...\n');
    p = inputParser;
    addParameter(p, 'nvar', 4000, @(x) isnumeric(x) && isscalar(x) && x > 0);
    addParameter(p, 'nmat', 400,  @(x) isnumeric(x) && isscalar(x) && x > 0);
    addParameter(p, 'entries_per_row', 8, @(x) isnumeric(x) && isscalar(x) && x > 0);
    addParameter(p, 'seed', 1234, @(x) isnumeric(x) && isscalar(x));
    parse(p, varargin{:});

    nvar = p.Results.nvar;
    nmat = p.Results.nmat;
    entries_per_row = p.Results.entries_per_row;
    seed = p.Results.seed;

    rng(seed);
    fprintf('Finished parsing inputs\n');

    build_tic = tic;

    % -------------------------------------------------
    % numeric data
    % -------------------------------------------------
    fprintf('Building random sparse data...\n');
    ntri = nmat * (nmat + 1) / 2;

    rows = zeros(ntri * entries_per_row, 1);
    cols = zeros(ntri * entries_per_row, 1);
    vals = zeros(ntri * entries_per_row, 1);

    t = 0;
    for r = 1:ntri
        if mod(r, max(1, floor(ntri/10))) == 0
            fprintf('  sparse data row %d / %d\n', r, ntri);
        end
        chosen = randi(nvar, entries_per_row, 1);
        for jj = 1:entries_per_row
            t = t + 1;
            rows(t) = r;
            cols(t) = chosen(jj);
            vals(t) = randn();
        end
    end

    A = sparse(rows, cols, vals, ntri, nvar);
    b = 0.01 * randn(ntri, 1);
    c = randn(nvar, 1);
    fprintf('Finished building random sparse data\n');

    % -------------------------------------------------
    % variables
    % -------------------------------------------------
    fprintf('Creating YALMIP variable x...\n');
    x = sdpvar(nvar, 1);
    fprintf('Finished creating x\n');

    fprintf('Building affine vector y = A*x + b ...\n');
    y = A*x + b;
    fprintf('Finished building y\n');

    % -------------------------------------------------
    % build symmetric Y WITHOUT scalar symbolic assignment
    % -------------------------------------------------
    fprintf('Building index map for upper triangle...\n');
    I = zeros(ntri,1);
    J = zeros(ntri,1);

    idx = 1;
    for j = 1:nmat
        if mod(j, max(1, floor(nmat/10))) == 0
            fprintf('  index column %d / %d\n', j, nmat);
        end
        for i = 1:j
            I(idx) = i;
            J(idx) = j;
            idx = idx + 1;
        end
    end
    fprintf('Finished building upper-triangle indices\n');

    fprintf('Assembling symbolic sparse upper triangle...\n');
    Yupper = sparse(I, J, y, nmat, nmat);
    fprintf('Symmetrizing Y...\n');
    Y = Yupper + triu(Yupper,1)';
    fprintf('Finished building Y\n');

    % -------------------------------------------------
    % model
    % -------------------------------------------------
    fprintf('Building constraints and objective...\n');
    Constraints = [Y >= 0];
    Objective = c' * x;
    fprintf('Finished constraints and objective\n');

    fprintf('Setting MOSEK options...\n');
    ops = sdpsettings( ...
        'solver', 'mosek', ...
        'verbose', 1, ...
        'debug', 1, ...
        'savesolveroutput', 1, ...
        'savesolverinput', 1);

    ops.mosek.MSK_DPAR_INTPNT_CO_TOL_PFEAS   = 1e-10;
    ops.mosek.MSK_DPAR_INTPNT_CO_TOL_DFEAS   = 1e-10;
    ops.mosek.MSK_DPAR_INTPNT_CO_TOL_REL_GAP = 1e-10;
    ops.mosek.MSK_IPAR_LOG                    = 10;
    ops.mosek.MSK_IPAR_LOG_INTPNT             = 10;
    ops.mosek.MSK_IPAR_LOG_STORAGE            = 1;
    ops.mosek.MSK_IPAR_LOG_CUT_SECOND_OPT     = 1;
    fprintf('Finished setting MOSEK options\n');

    build_time = toc(build_tic);
    fprintf('Finished building model. Build time = %.6f s\n', build_time);

    fprintf('About to optimize...\n');
    optimize_tic = tic;
    sol = optimize(Constraints, Objective, ops);
    optimize_time = toc(optimize_tic);

    fprintf('Finished optimize\n');
    fprintf('problem            = %d\n', sol.problem);
    fprintf('info               = %s\n', sol.info);
    fprintf('Optimize time      = %.6f s\n', optimize_time);

    if sol.problem == 0
        fprintf('Extracting solution...\n');
        xval = value(x);
        objval = value(Objective);
        fprintf('objective_value    = %.16g\n', objval);
    else
        xval = [];
        objval = NaN;
        fprintf('objective_value    = NaN\n');
    end

    total_time = toc(total_tic);
    fprintf('Total time         = %.6f s\n', total_time);

    timing.build_time = build_time;
    timing.optimize_time = optimize_time;
    timing.total_time = total_time;
end```

was suggesting it’s how JuMP is doing dualization

JuMP doesn’t dualise by default.

Can you please post the log output from MATLAB.

I talked to Mosek support. The difference is that YALMIP automatically dualizes the problem, Mosek doesn’t, and that solving the dual is much more efficient than the primal.

See this tutorial: Dualization · JuMP

Here’s the code that works:

julia> using JuMP

julia> using MosekTools

julia> import Random

julia> import SparseArrays

julia> import Dualization

julia> function minimal_psd_test(;
           nvar::Int = 4000,
           nmat::Int = 400,
           entries_per_row::Int = 8,
           seed::Int = 1234,
       )
           Random.seed!(seed)
           ntri = nmat * (nmat + 1) ÷ 2
           rows = repeat(1:ntri; inner = entries_per_row)
           cols = rand(1:nvar, ntri * entries_per_row)
           vals = randn(ntri * entries_per_row)
           A = SparseArrays.sparse(rows, cols, vals, ntri, nvar)
           b = 0.01 .* randn(ntri)
           c = randn(nvar)
           model = Model(Dualization.dual_optimizer(Mosek.Optimizer))
           @variable(model, x[1:nvar])
           @expression(model, y, A * x + b)
           Y = reshape_vector(y, SymmetricMatrixShape(nmat))
           @constraint(model, Y in PSDCone())
           @objective(model, Min, c' * x)
           optimize!(model)
           display(solution_summary(model))
           return model
       end
minimal_psd_test (generic function with 1 method)

julia> minimal_psd_test()
Problem
  Name                   :                 
  Objective sense        : maximize        
  Type                   : CONIC (conic optimization problem)
  Constraints            : 4000            
  Affine conic cons.     : 0               
  Disjunctive cons.      : 0               
  Cones                  : 0               
  Scalar variables       : 0               
  Matrix variables       : 1 (scalarized: 80200)
  Integer variables      : 0               

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00            
Lin. dep.  - tries                  : 1                 time                   : 0.00            
Lin. dep.  - primal attempts        : 1                 successes              : 1               
Lin. dep.  - dual attempts          : 0                 successes              : 0               
Lin. dep.  - primal deps.           : 0                 dual deps.             : 0               
Presolve terminated. Time: 0.00    
Optimizer  - threads                : 10              
Optimizer  - solved problem         : the primal      
Optimizer  - Constraints            : 4000            
Optimizer  - Cones                  : 0               
Optimizer  - Scalar variables       : 0                 conic                  : 0               
Optimizer  - Semi-definite variables: 1                 scalarized             : 80200           
Factor     - setup time             : 0.20            
Factor     - dense det. time        : 0.00              GP order time          : 0.00            
Factor     - nonzeros before factor : 8.00e+06          after factor           : 8.00e+06        
Factor     - dense dim.             : 0                 flops                  : 1.67e+11        
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   1.4e+00  1.0e+00  9.1e-01  0.00e+00   8.883151514e-02   -0.000000000e+00  1.0e+00  0.36  
1   1.8e-02  1.3e-02  6.6e-02  -9.41e-01  3.480810414e+01   3.464322771e+00   1.3e-02  1.40  
2   4.1e-07  3.1e-07  6.3e-04  -1.00e+00  5.533940656e+06   -2.211702459e+01  2.9e-07  2.33  
3   8.0e-14  5.2e-14  1.8e-11  -1.00e+00  9.529365688e-01   -1.082211814e-12  5.1e-14  3.10  
Optimizer terminated. Time: 3.10    

solution_summary(; result = 1, verbose = false)
├ solver_name          : Dual model with Mosek attached
├ Termination
│ ├ termination_status : INFEASIBLE
│ ├ result_count       : 1
│ ├ raw_status         : Mosek.MSK_SOL_STA_DUAL_INFEAS_CER
│ └ objective_bound    : NaN
├ Solution (result = 1)
│ ├ primal_status        : INFEASIBILITY_CERTIFICATE
│ ├ dual_status          : NO_SOLUTION
│ ├ objective_value      : 9.52937e-01
│ ├ dual_objective_value : -1.08221e-12
│ └ relative_gap         : NaN
└ Work counters
  ├ solve_time (sec)   : 3.09692e+00
  ├ simplex_iterations : 0
  ├ barrier_iterations : 3
  └ node_count         : 0
A JuMP Model
├ solver: Dual model with Mosek attached
├ objective_sense: MIN_SENSE
│ └ objective_function_type: AffExpr
├ num_variables: 4000
├ num_constraints: 1
│ └ Vector{AffExpr} in MOI.PositiveSemidefiniteConeTriangle: 1
└ Names registered in the model
  └ :x, :y
1 Like

Do you know how YALMIP decides to solve the dual problem?

There’s no decision, it just always dualizes.

2 Likes

Here’s the code:

1 Like

Hi Oscar,
Thanks for the reply. But I think it must be somehow the dualization is doing something very weird for the hermitian variable I am actually considering. For real variables, it seems to work OK. But let me attach the following two examples, which really has the essence and also has similar size to the problem I encountered. The first is a julia piece of code. I declared a bunch of matrix called Dmatrix, all of which are Hermitian, then I vectorize them in Dvec. Then using a sparse map M, I map Dvec into T2vec, which is then reshaped into a 300 by 300 matrix, and Hermitianized, and required to be PSD. I then optimize some random linear function of Dvec. For the real problem, the map and the reshaping into 300 by 300 matrix is done in some particular way, but that’s not super important here. So I just make the map random sparse matrix.

using LinearAlgebra
using SparseArrays
using Random
using JuMP
using MosekTools
using Dualization

function minimal_psd_test(; use_dual=false, seed=1234)
    Random.seed!(seed)

    # Rough toy sizes only
    NQ   = 12          # number of q sectors
    nred = 8           # size of the two small Hermitian blocks
    nbig = 18          # size of the one larger Hermitian block
    t2n  = 300         # T2 matrix size
    nzpr = 6           # sparse-map entries per row

    # Total length of vectorized D variables
    nd = NQ*nred^2 + NQ*nred^2 + NQ*nbig^2

    # Random sparse map M : Dvec -> T2vec
    rows = Int[]
    cols = Int[]
    vals = Float64[]
    for r in 1:(t2n*t2n)
        js = rand(1:nd, nzpr)
        vs = randn(nzpr)
        for k in 1:nzpr
            push!(rows, r)
            push!(cols, js[k])
            push!(vals, vs[k])
        end
    end
    M = sparse(rows, cols, vals, t2n*t2n, nd)

    # Random objective coefficients
    c = randn(nd)

    if use_dual
        model = Model(Dualization.dual_optimizer(Mosek.Optimizer))
    else
        model = Model(Mosek.Optimizer)
    end

    set_string_names_on_creation(model, false)
    set_attribute(model, "MSK_IPAR_LOG", 10)

    # Analog of your Dmatrix
    Dmatrix = Vector{Any}(undef, 3)
    Dmatrix[1] = Vector{Any}(undef, NQ)
    Dmatrix[2] = Vector{Any}(undef, NQ)
    Dmatrix[3] = Vector{Any}(undef, NQ)

    for q in 1:NQ
        Dmatrix[1][q] = @variable(model, [1:nred, 1:nred], Hermitian)
        Dmatrix[2][q] = @variable(model, [1:nred, 1:nred], Hermitian)
        Dmatrix[3][q] = @variable(model, [1:nbig, 1:nbig], Hermitian)

        @constraint(model, Hermitian(Dmatrix[1][q]) in HermitianPSDCone())
        @constraint(model, Hermitian(Dmatrix[2][q]) in HermitianPSDCone())
        @constraint(model, Hermitian(Dmatrix[3][q]) in HermitianPSDCone())
    end

    # Vectorize in the same spirit as your real code
    Dvec = vcat(
        [vec(Dmatrix[1][q]) for q in 1:NQ]...,
        [vec(Dmatrix[2][q]) for q in 1:NQ]...,
        [vec(Dmatrix[3][q]) for q in 1:NQ]...
    )

    # Sparse affine map -> large matrix
    T2vec = M * Dvec
    T2raw = reshape(T2vec, t2n, t2n)
    T2herm = 0.5 * (T2raw + T2raw')

    @constraint(model, Hermitian(T2herm) in HermitianPSDCone())

    @objective(model, Min, sum(c[i] * real(Dvec[i]) for i in 1:nd))

  

    optimize!(model)

    return model
end

Now either using dual or not using dual, the above piece of code will produce OOM error. I attach the error message below (using dual):

Problem
  Name                   :                 
  Objective sense        : maximize        
  Type                   : CONIC (conic optimization problem)
  Constraints            : 101556          
  Affine conic cons.     : 0               
  Disjunctive cons.      : 0               
  Cones                  : 0               
  Scalar variables       : 0               
  Matrix variables       : 37 (scalarized: 191556)
  Integer variables      : 0               

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00            
Lin. dep.  - tries                  : 1                 time                   : 0.02            
Lin. dep.  - primal attempts        : 1                 successes              : 1               
Lin. dep.  - dual attempts          : 0                 successes              : 0               
Lin. dep.  - primal deps.           : 0                 dual deps.             : 0               
Presolve terminated. Time: 0.02    
Optimizer terminated. Time: 27.53   


Mosek.MosekError(1051, "")

Stacktrace:
  [1] optimize
    @ C:\Users\10519\.julia\packages\Mosek\Gzljt\src\msk_functions.jl:2399 [inlined]
  [2] optimize!(m::MosekTools.Optimizer)
    @ MosekTools C:\Users\10519\.julia\packages\MosekTools\hgngi\src\MosekTools.jl:356
  [3] optimize!(dest::MosekTools.Optimizer, src::MathOptInterface.Utilities.UniversalFallback{…})
    @ MathOptInterface C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\MathOptInterface.jl:122
  [4] optimize!(m::MathOptInterface.Utilities.CachingOptimizer{…})
    @ MathOptInterface.Utilities C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\Utilities\cachingoptimizer.jl:370
  [5] optimize!
    @ C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\Bridges\bridge_optimizer.jl:367 [inlined]
  [6] optimize!
    @ C:\Users\10519\.julia\packages\Dualization\WlGYB\src\MOI_wrapper.jl:255 [inlined]
  [7] optimize!
    @ C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\MathOptInterface.jl:122 [inlined]
  [8] optimize!(m::MathOptInterface.Utilities.CachingOptimizer{…})
    @ MathOptInterface.Utilities C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\Utilities\cachingoptimizer.jl:370
  [9] optimize!
    @ C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\Bridges\bridge_optimizer.jl:367 [inlined]
 [10] optimize!
    @ C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\MathOptInterface.jl:122 [inlined]
 [11] optimize!(m::MathOptInterface.Utilities.CachingOptimizer{…})
    @ MathOptInterface.Utilities C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\Utilities\cachingoptimizer.jl:370
 [12] optimize!(model::Model; ignore_optimize_hook::Bool, _differentiation_backend::MathOptInterface.Nonlinear.SparseReverseMode, kwargs::@Kwargs{})
    @ JuMP C:\Users\10519\.julia\packages\JuMP\0tD10\src\optimizer_interface.jl:610
 [13] optimize!
    @ C:\Users\10519\.julia\packages\JuMP\0tD10\src\optimizer_interface.jl:560 [inlined]
 [14] minimal_psd_test(; use_dual::Bool, seed::Int64)
    @ Main .\In[11]:84
 [15] top-level scope
    @ In[13]:1
Some type information was truncated. Use `show(err)` to see complete types.

Here is the error mesage for not using dual

Problem
  Name                   :                 
  Objective sense        : minimize        
  Type                   : CONIC (conic optimization problem)
  Constraints            : 0               
  Affine conic cons.     : 37 (191556 rows)
  Disjunctive cons.      : 0               
  Cones                  : 0               
  Scalar variables       : 5424            
  Matrix variables       : 0               
  Integer variables      : 0               

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00            
Lin. dep.  - tries                  : 1                 time                   : 0.06            
Lin. dep.  - primal attempts        : 1                 successes              : 1               
Lin. dep.  - dual attempts          : 0                 successes              : 0               
Lin. dep.  - primal deps.           : 0                 dual deps.             : 0               
Presolve terminated. Time: 0.37    
Optimizer terminated. Time: 3.06    


Mosek.MosekError(1051, "")

Stacktrace:
 [1] optimize
   @ C:\Users\10519\.julia\packages\Mosek\Gzljt\src\msk_functions.jl:2399 [inlined]
 [2] optimize!(m::MosekTools.Optimizer)
   @ MosekTools C:\Users\10519\.julia\packages\MosekTools\hgngi\src\MosekTools.jl:356
 [3] optimize!
   @ C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\Bridges\bridge_optimizer.jl:367 [inlined]
 [4] optimize!
   @ C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\MathOptInterface.jl:122 [inlined]
 [5] optimize!(m::MathOptInterface.Utilities.CachingOptimizer{…})
   @ MathOptInterface.Utilities C:\Users\10519\.julia\packages\MathOptInterface\7aTit\src\Utilities\cachingoptimizer.jl:370
 [6] optimize!(model::Model; ignore_optimize_hook::Bool, _differentiation_backend::MathOptInterface.Nonlinear.SparseReverseMode, kwargs::@Kwargs{})
   @ JuMP C:\Users\10519\.julia\packages\JuMP\0tD10\src\optimizer_interface.jl:610
 [7] optimize!
   @ C:\Users\10519\.julia\packages\JuMP\0tD10\src\optimizer_interface.jl:560 [inlined]
 [8] minimal_psd_test(; use_dual::Bool, seed::Int64)
   @ Main .\In[11]:84
 [9] top-level scope
   @ In[14]:1
Some type information was truncated. Use `show(err)` to see complete types.

Now one can use SCS to as optimizer, that will execute through for this particular example in about 2 minutes. But let’s stick to mosek (I have specific reason I want to use mosek rather than SCS, since for some other problems it seems the mosek converges better), since matlab yalmip seems to use mosek in a different way. The same problem, if written in matlab, using yalmip and mosek, is the follwoing

function [sol, info] = minimal_psd_test_matlab()

rng(1234);

% rough toy sizes
NQ   = 12;
nred = 8;
nbig = 18;
t2n  = 300;
nzpr = 6;

nd = NQ*nred^2 + NQ*nred^2 + NQ*nbig^2;

% sparse random map M : Dvec -> T2vec
nrow = t2n*t2n;
rows = zeros(nrow*nzpr,1);
cols = zeros(nrow*nzpr,1);
vals = zeros(nrow*nzpr,1);

ptr = 1;
for r = 1:nrow
    js = randi(nd, nzpr, 1);
    vs = randn(nzpr, 1);
    idx = ptr:(ptr+nzpr-1);
    rows(idx) = r;
    cols(idx) = js;
    vals(idx) = vs;
    ptr = ptr + nzpr;
end

M = sparse(rows, cols, vals, nrow, nd);

% random objective coefficients
c = randn(nd,1);

yalmip('clear');

Constraints = [];

% analog of Dmatrix
Dmatrix = cell(3, NQ);

for q = 1:NQ
    Dmatrix{1,q} = sdpvar(nred, nred, 'hermitian', 'complex');
    Dmatrix{2,q} = sdpvar(nred, nred, 'hermitian', 'complex');
    Dmatrix{3,q} = sdpvar(nbig, nbig, 'hermitian', 'complex');

    Constraints = [Constraints, Dmatrix{1,q} >= 0];
    Constraints = [Constraints, Dmatrix{2,q} >= 0];
    Constraints = [Constraints, Dmatrix{3,q} >= 0];
end

Dvec = [];
for q = 1:NQ
    Dvec = [Dvec; Dmatrix{1,q}(:)];
end
for q = 1:NQ
    Dvec = [Dvec; Dmatrix{2,q}(:)];
end
for q = 1:NQ
    Dvec = [Dvec; Dmatrix{3,q}(:)];
end

T2vec  = M * Dvec;
T2raw  = reshape(T2vec, t2n, t2n);
T2herm = 0.5 * (T2raw + T2raw');

Constraints = [Constraints, T2herm >= 0];

Objective = real(c' * Dvec);

ops = sdpsettings('solver','mosek','verbose',1);

fprintf('NQ     = %d\n', NQ);
fprintf('nred   = %d\n', nred);
fprintf('nbig   = %d\n', nbig);
fprintf('nd     = %d\n', nd);
fprintf('t2n    = %d\n', t2n);
fprintf('nnz(M) = %d\n', nnz(M));

sol = optimize(Constraints, Objective, ops);

info.problem = sol.problem;
info.info = sol.info;
info.yalmiptime = sol.yalmiptime;
info.solvertime = sol.solvertime;

if sol.problem == 0
    info.objective_value = value(Objective);
else
    info.objective_value = NaN;
end

end

Calling this function in matlab gives me the following output. The optimization is successful and has no OOM issue:

NQ     = 12
nred   = 8
nbig   = 18
nd     = 5424
t2n    = 300
nnz(M) = 539748

MOSEK Version 11.1.10 (Build date: 2026-3-16 11:37:32)
Copyright (c) MOSEK ApS, Denmark WWW: mosek.com
Platform: Windows/64-X86

Problem
  Name                   :                 
  Objective sense        : minimize        
  Type                   : CONIC (conic optimization problem)
  Constraints            : 5424            
  Affine conic cons.     : 0               
  Disjunctive cons.      : 0               
  Cones                  : 0               
  Scalar variables       : 0               
  Matrix variables       : 37 (scalarized: 191556)
  Integer variables      : 0               

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00            
Lin. dep.  - tries                  : 1                 time                   : 0.00            
Lin. dep.  - primal attempts        : 1                 successes              : 1               
Lin. dep.  - dual attempts          : 0                 successes              : 0               
Lin. dep.  - primal deps.           : 0                 dual deps.             : 0               
Presolve terminated. Time: 0.00    
Optimizer  - threads                : 8               
Optimizer  - solved problem         : the primal      
Optimizer  - Constraints            : 5424            
Optimizer  - Cones                  : 0               
Optimizer  - Scalar variables       : 0                 conic                  : 0               
Optimizer  - Semi-definite variables: 37                scalarized             : 191556          
Factor     - setup time             : 1.63            
Factor     - dense det. time        : 0.00              GP order time          : 0.00            
Factor     - nonzeros before factor : 1.47e+07          after factor           : 1.47e+07        
Factor     - dense dim.             : 0                 flops                  : 9.02e+11        
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   5.9e+00  1.0e+00  1.0e+00  0.00e+00   0.000000000e+00   0.000000000e+00   1.0e+00  3.33  
1   2.3e-01  4.0e-02  1.1e-01  -9.39e-01  0.000000000e+00   7.579916161e+00   4.0e-02  15.32 
2   2.1e-04  3.5e-05  1.3e-06  9.52e-01   0.000000000e+00   9.472120673e-04   3.5e-05  27.54 
3   1.2e-14  8.0e-19  1.8e-26  1.00e+00   0.000000000e+00   3.575075856e-16   8.0e-19  39.53 
Optimizer terminated. Time: 39.54   


Interior-point solution summary
  Problem status  : PRIMAL_AND_DUAL_FEASIBLE
  Solution status : OPTIMAL
  Primal.  obj: 0.0000000000e+00    nrm: 1e+01    Viol.  con: 3e-13    barvar: 0e+00  
  Dual.    obj: 3.5750758560e-16    nrm: 8e-18    Viol.  con: 0e+00    barvar: 7e-18  
Optimizer summary
  Optimizer                 -                        time: 39.54   
    Interior-point          - iterations : 3         time: 39.54   
    Simplex                 - iterations : 0         time: 0.00    
    Mixed integer           - relaxations: 0         time: 0.00    

So I am now rather confused what exactly did YALMIP did such that JuMP didn’t do that saves me these memory? It must be something that’s beyond dualization that’s making YALMIP much more memory efficient?

On my computer running your code without dualization takes about 300 GB of RAM, and with dualization about 70 GB, so it does help. But you’re right, JuMP is doing something wrong here, it shouldn’t be so much less efficient than YALMIP.

I do have a faint recollection that I was doing some tests a few years ago, and I couldn’t get JuMP to send the same model to MOSEK as YALMIP did. Maybe it was because YALMIP used the LMI mode, and JuMP the primal mode, as defined here: 6.7 Semidefinite Optimization — MOSEK Optimizer API for C 10.2.19

1 Like