Huge increase in variables/constraints with Convex.jl

FriesischScott · June 7, 2024, 12:46pm

I am new to Convex.jl and DCP in general.
I am currently working on a convex optimization problem that involves 27 variables constrained to be non-negative. I have formulated the problem in Convex and it shows thefollowing problem statistics:

 problem is DCP         : true
  number of variables    : 1 (27 scalar elements)
  number of constraints  : 1 (27 scalar elements)

However, once I solve the problem with SCS it reports 10028 variables and 30028 constraints. While I expected some differences due to the reformulation that Convex does, this seems like quite a huge increase.

------------------------------------------------------------------
               SCS v3.2.4 - Splitting Conic Solver
        (c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 10028, constraints m: 30028
cones:    l: linear vars: 28
          e: exp vars: 30000, dual exp vars: 0

The problem involves 2-d data mapped to a higher dimensional feature space, 27-d in this case. I have prepared a MWE with random data.

using Convex
using SCS

n = 27

a = 1000
b = 10000

A = rand(a, n)
B = rand(b, n)

V = 15

l = Variable(n)

con = l >= 0
problem = minimize(a * (log(V / b) + logsumexp((B * l) / -2)) + sum(A * l) / 2, con)

The first log term is constant an can safely be ignored.

Is the number of variables/constraints that SCS reports expected here? Is there something I should be doing differently in my formulation of the problem?

ericphanson · June 7, 2024, 1:04pm

hm, this happens since we reformulate the logsumexp into an exponential cone constraint of the form (x[i], 1, t[i]) ∈ ExponentialCone() for each x[i] in the input, where t[i] is an auxiliary variable. Since size(B*l) = (10_000, 1), that means there are 10k exponential cone constraints, each with a corresponding auxiliary variable t[i]. Each exponential cone constraint involves 3 entries, so maybe that’s why we end up with SCS saying 30k constraints.

I’m not sure if there’s something we can do better or not here. I think either way we need order 10^4 constraints, since we have B*l of size 10k which needs to be constrained to a cone. But maybe there’s some formulation in which we don’t need 10^4 auxiliary variables while doing that?

edit: to answer the question, I would say this is expected, and I don’t know a better formulation, but maybe someone else does.

odow · June 7, 2024, 11:56pm

See Tips and Tricks · JuMP

I think this is just an unavoidable formulation of logsumexp. This helps to show what is happening:

using Convex, JuMP, SCS

function solve_convex(B)
    b, n = size(B)
    l = Variable(n)
    problem = minimize(logsumexp(B * l), [l >= 0])
    solve!(problem, SCS.Optimizer)
    return problem.optval
end

# Want
#   log(sum(exp.(B * l)))
# need affine objective, introduce epigraph
#   log(sum(exp.(B * l))) <= t
# exp both sides
#   sum(exp.(B * l)) <= exp(t)
# divide by exp(t)
#   sum(exp.(B * l - t)) <= 1
# introduce epigraph of exp(B[i, :]' * l - t)
#   [B[i, :]' * l - t, 1, u[i]] in ExponentialCone() for all i
#   sum(u) <= 1
function solve_jump(B)
    b, n = size(B)
    model = Model(SCS.Optimizer)
    @variables(model, begin
        l[1:n] >= 0
        t
        u[1:b]
    end)
    @constraints(model, begin
        [i in 1:b], [B[i, :]' * l - t, 1, u[i]] in MOI.ExponentialCone()
        sum(u) <= 1
    end)
    @objective(model, Min, t)
    optimize!(model)
    return objective_value(model)
end

B = rand(10_000, 27);

julia> B = rand(10_000, 27);

julia> solve_convex(B)
[ Info: [Convex.jl] Compilation finished: 0.07 seconds, 63.466 MiB of memory allocated
------------------------------------------------------------------
	       SCS v3.2.4 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 10028, constraints m: 30028
cones: 	  l: linear vars: 28
	  e: exp vars: 30000, dual exp vars: 0
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, rho_x: 1.00e-06
	  acceleration_lookback: 10, acceleration_interval: 10
	  compiled with openmp parallelization enabled
lin-sys:  sparse-direct-amd-qdldl
	  nnz(A): 300027, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 1.00e+00  3.84e+02  5.49e+02  2.74e+02  1.00e-01  1.08e-01 
   250| 1.61e-01  8.79e-02  8.53e-02  8.57e+00  6.43e-03  7.21e-01 
   500| 9.40e-05  8.75e-06  1.04e-05  9.21e+00  6.43e-03  1.32e+00 
------------------------------------------------------------------
status:  solved
timings: total: 1.32e+00s = setup: 9.95e-02s + solve: 1.22e+00s
	 lin-sys: 6.82e-01s, cones: 3.44e-01s, accel: 4.79e-02s
------------------------------------------------------------------
objective = 9.210337
------------------------------------------------------------------
9.210331666844379

julia> solve_jump(B)
------------------------------------------------------------------
	       SCS v3.2.4 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 10028, constraints m: 30028
cones: 	  l: linear vars: 28
	  e: exp vars: 30000, dual exp vars: 0
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, rho_x: 1.00e-06
	  acceleration_lookback: 10, acceleration_interval: 10
	  compiled with openmp parallelization enabled
lin-sys:  sparse-direct-amd-qdldl
	  nnz(A): 300027, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 1.00e+00  3.84e+02  5.49e+02  2.74e+02  1.00e-01  1.04e-01 
   250| 1.61e-01  8.79e-02  8.53e-02  8.57e+00  6.43e-03  7.15e-01 
   500| 9.40e-05  8.75e-06  1.04e-05  9.21e+00  6.43e-03  1.30e+00 
------------------------------------------------------------------
status:  solved
timings: total: 1.30e+00s = setup: 9.45e-02s + solve: 1.21e+00s
	 lin-sys: 6.90e-01s, cones: 3.30e-01s, accel: 4.79e-02s
------------------------------------------------------------------
objective = 9.210337
------------------------------------------------------------------
9.210331666844379

We need to introduce b + 1 variables, and there are 3b + 1 new rows.

FriesischScott · June 10, 2024, 7:24am

Thank you both for the clarifications.

Unfortunately it looks like Convex might not be the best for me in this case. I thought something specifically tailored to onvex optimization problems would be the way to go but JuMP and NLopt solve the problem significantly faster.

tcovert · June 11, 2024, 2:14am

You might try this problem with Clarabel, staying within the convex programming framework. I’ve found it to be as fast (or sometimes faster) than hand rolled multinomial logit code (with analytic gradients and hessians), on problems with thousands of covariates.

odow · June 11, 2024, 2:35am

You could also use Ipopt:

julia> model = Model(Ipopt.Optimizer)
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: EMPTY_OPTIMIZER
Solver name: Ipopt

julia> @variable(model, l[1:n] >= 0);

julia> @objective(
           model, 
           Min, 
           a * (log(V / b) + log(sum(exp.((B * l) / -2)))) + sum(A * l) / 2,
       );

julia> optimize!(model)
This is Ipopt version 3.14.13, running with linear solver MUMPS 5.6.1.

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:      378

Total number of variables............................:       27
                     variables with only lower bounds:       27
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  2.7085352e+03 0.00e+00 8.06e+00  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  2.7085632e+03 0.00e+00 6.38e+00  -1.0 1.32e-01    -  2.34e-01 1.00e+00f  1
   2  2.7053320e+03 0.00e+00 1.33e+00  -1.0 1.88e-01    -  1.00e+00 6.63e-01f  1
   3  2.7060912e+03 0.00e+00 3.95e-04  -1.0 4.20e-02    -  1.00e+00 1.00e+00f  1
   4  2.7046346e+03 0.00e+00 4.77e-04  -2.5 3.39e-02    -  1.00e+00 1.00e+00f  1
   5  2.7044748e+03 0.00e+00 6.71e-05  -3.8 1.70e-02    -  1.00e+00 1.00e+00f  1
   6  2.7044617e+03 0.00e+00 7.16e-06  -3.8 7.48e-03    -  1.00e+00 1.00e+00f  1
   7  2.7044573e+03 0.00e+00 5.45e-07  -5.7 2.66e-03    -  1.00e+00 1.00e+00f  1
   8  2.7044571e+03 0.00e+00 7.63e-09  -5.7 3.84e-04    -  1.00e+00 1.00e+00f  1
   9  2.7044570e+03 0.00e+00 7.69e-12  -8.6 1.34e-05    -  1.00e+00 1.00e+00f  1

Number of Iterations....: 9

                                   (scaled)                 (unscaled)
Objective...............:   2.7044570259870202e+03    2.7044570259870202e+03
Dual infeasibility......:   7.6931794268375597e-12    7.6931794268375597e-12
Constraint violation....:   0.0000000000000000e+00    0.0000000000000000e+00
Variable bound violation:   9.7193984301167896e-09    9.7193984301167896e-09
Complementarity.........:   6.2626849075435762e-09    6.2626849075435762e-09
Overall NLP error.......:   6.2626849075435762e-09    6.2626849075435762e-09


Number of objective function evaluations             = 10
Number of objective gradient evaluations             = 10
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 9
Total seconds in IPOPT                               = 0.908

EXIT: Optimal Solution Found.

julia> value.(l)
27-element Vector{Float64}:
  7.0347191920600054e-9
 -9.694921853255317e-9
  0.22066665576481762
 -9.625943768168848e-9
 -9.71939843011679e-9
  0.07915073669819384
 -9.602961653536133e-9
  0.0795289045675788
  0.13103237526003292
 -9.312149059022509e-9
 -2.3129631494442203e-9
 -9.259544870841831e-9
  0.034978324299520976
 -9.686542634680258e-9
 -9.263422571993028e-9
  0.2726662304153659
 -9.704162156528219e-9
 -9.649989140454418e-9
  0.14037497201613416
  0.05531501546057047
  0.29767433466343496
  0.17360469504770334
 -9.67403050984381e-9
 -9.433798273626065e-9
 -9.324904356693814e-9
  0.20871867222440957
 -7.267831349602901e-9

FriesischScott · June 11, 2024, 12:18pm

Thanks for the suggestions. Unfortunately Clarabel aborts early with “SLOW_PROGRESS” when using the real data instead of the random data in the MWE. Ipopt on the other hand converges very quickly and returns the correct result.

Thanks!

pgoulart · June 11, 2024, 12:21pm

Worth saying also that the large increase in variables and constraints is not necessarily a problem. As @ericphanson says above, those variables comes from remodelling the objective in terms of a large number of coupled cones of dimension 3. This means that the KKT system to be solved via SCS (or Clarabel, or Mosek, or whatever) can be very sparse, so that the per-iteration time doesn’t increase much at all.

FriesischScott · June 11, 2024, 12:40pm

yeah, I can see how that is true. Unfortunately the only solver I could find that solves my problem fast enough using Convex is Mosek and I’d rather not rely on a paid/licensed solver. I am using Convex and SCS in a different project though, where they perform insanely well.

Topic		Replies	Views
Understanding DCP in Convex.jl and the difference of elapsed time Optimization (Mathematical) convex-optimization , disciplined-convex-p	8	656	December 6, 2021
Solving large(?) problem with Convex.jl and SCS Optimization (Mathematical) question , first-steps	13	2584	February 13, 2020
Minimizing a logsumexp function with many terms (Convex.jl) Optimization (Mathematical) convex-optimization	11	490	February 29, 2024
Largest number of variables that SCS.jl can handle Optimization (Mathematical) question , package , jump	11	432	July 8, 2023
Convex.jl (+SCS) more than 100* slower than R counterpart, CVXR + SCS Optimization (Mathematical) performance	12	1643	December 3, 2019

Huge increase in variables/constraints with Convex.jl

Related topics