Largest number of variables that SCS.jl can handle

I was using JuMP.jl and SCS.jl to code and solve an optimization problem. The problem involved 234,932 optimization variables and 251,088 constraints. However, I encountered the following error:

[113945] signal (11.1): Segmentation fault
ldl_prepare at /workspace/srcdir/scs/linsys/cpu/direct/private.c:34 [inlined]
scs_init_lin_sys_work at /workspace/srcdir/scs/linsys/cpu/direct/private.c:237
init_work at /workspace/srcdir/scs/src/scs.c:890 [inlined]
scs_init at /workspace/srcdir/scs/src/scs.c:1227
scs_init at /public1/home/user/.julia/packages/SCS/owpZW/src/linear_solvers/direct.jl:25 [inlined]
_unsafe_scs_solve at /public1/home/user/.julia/packages/SCS/owpZW/src/c_wrapper.jl:390
#scs_solve#13 at /public1/home/user/.julia/packages/SCS/owpZW/src/c_wrapper.jl:349
scs_solve at /public1/home/user/.julia/packages/SCS/owpZW/src/c_wrapper.jl:278
unknown function (ip: 0x2ab7e4d97b4f)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
optimize! at /public1/home/user/.julia/packages/SCS/owpZW/src/MOI_wrapper/MOI_wrapper.jl:366
unknown function (ip: 0x2ab7e4d9355b)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
optimize! at /public1/home/user/.julia/packages/SCS/owpZW/src/MOI_wrapper/MOI_wrapper.jl:440
optimize! at /public1/home/user/.julia/packages/MathOptInterface/BlCD1/src/Utilities/cachingoptimizer.jl:316
unknown function (ip: 0x2ab7e4d7a062)
unknown function (ip: 0x2ab7e4d62309)
unknown function (ip: 0x2ab7e4d622aa)
optimize! at /public1/home/user/.julia/packages/MathOptInterface/BlCD1/src/Bridges/bridge_optimizer.jl:376 [inlined]
optimize! at /public1/home/user/.julia/packages/MathOptInterface/BlCD1/src/MathOptInterface.jl:85 [inlined]
optimize! at /public1/home/user/.julia/packages/MathOptInterface/BlCD1/src/Utilities/cachingoptimizer.jl:316
unknown function (ip: 0x2ab7e4d62272)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
#optimize!#113 at /public1/home/user/.julia/packages/JuMP/ptoff/src/optimizer_interface.jl:440
optimize! at /public1/home/user/.julia/packages/JuMP/ptoff/src/optimizer_interface.jl:410
jfptr_optimizeNOT._2915 at /public1/home/user/.julia/compiled/v1.9/JuMP/DmXqY_u22Yc.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
do_call at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:126
eval_value at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:226
eval_stmt_value at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:177 [inlined]
eval_body at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:624
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_toplevel_eval_flex at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1864
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
_include at ./loading.jl:1924
include at ./client.jl:478
unknown function (ip: 0x2ab7e4cd8272)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
do_call at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:126
eval_value at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:226
eval_stmt_value at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:177 [inlined]
eval_body at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:624
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_toplevel_eval_flex at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1864
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
_include at ./loading.jl:1924
include at ./Base.jl:457
jfptr_include_43521.clone_1 at /public1/home/user/julia/julia-1.9.0/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
exec_options at ./client.jl:307
_start at ./client.jl:522
jfptr__start_37386.clone_1 at /public1/home/user/julia/julia-1.9.0/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
true_main at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/jlapi.c:573
jl_repl_entrypoint at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/jlapi.c:717
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 122536122659 (Pool: 122441856243; Big: 94266416); GC: 93

I want to note that I had enough memory available and my system has a total memory of 2T with 64 cores, and the code only costs around 1T. Interestingly, when I worked on smaller-scale optimization problems constructed using the same principle, the program returned the correct results. Hence, I am curious if anyone has any insights into why this error is occurring. For instance, does this error occur because the scale of the problem is too large?

Other information:

julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
  Threads: 1 on 32 virtual cores
Environment:
  LD_LIBRARY_PATH = /public1/soft/intel/2015/impi/5.0.3.049/intel64/lib:/public1/soft/intel/2015/composer_xe_2015.6.233/debugger/libipt/intel64/lib:/public1/soft/intel/2015/composer_xe_2015.6.233/tbb/lib/intel64/gcc4.4:/public1/soft/intel/2015/composer_xe_2015.6.233/mkl/lib/intel64:/public1/soft/intel/2015/composer_xe_2015.6.233/ipp/tools/intel64/perfsys:/public1/soft/intel/2015/composer_xe_2015.6.233/ipp/lib/intel64:/public1/soft/intel/2015/composer_xe_2015.6.233/ipp/../compiler/lib/intel64:/public1/soft/intel/2015/composer_xe_2015.6.233/mpirt/lib/intel64:/public1/soft/intel/2015/composer_xe_2015.6.233/compiler/lib/intel64
  LD_LIBRARY_PATH_modshare = /public1/soft/intel/2015/composer_xe_2015.6.233/tbb/lib/intel64/gcc4.4:1:/public1/soft/intel/2015/composer_xe_2015.6.233/ipp/tools/intel64/perfsys:1:/public1/soft/intel/2015/composer_xe_2015.6.233/ipp/../compiler/lib/intel64:1:/public1/soft/intel/2015/composer_xe_2015.6.233/compiler/lib/intel64:1:/public1/soft/intel/2015/composer_xe_2015.6.233/debugger/libipt/intel64/lib:1:/public1/soft/intel/2015/composer_xe_2015.6.233/mkl/lib/intel64:1:/public1/soft/intel/2015/composer_xe_2015.6.233/mpirt/lib/intel64:1:/public1/soft/intel/2015/impi/5.0.3.049/intel64/lib:1:/public1/soft/intel/2015/composer_xe_2015.6.233/ipp/lib/intel64:1

That sure looks like an out of memory error to me. How confident are you really that it only requires 1TB of RAM?

Hi there @WellWellww, welcome to the forum.

How dense is your matrix? It seems like SCS is failing when it tries to factor your matrix.

You should try the indirect linear solver: GitHub - jump-dev/SCS.jl: Julia Wrapper for SCS (https://github.com/cvxgrp/scs)

using JuMP, SCS
model = Model(SCS.Optimizer)
set_attribute(model, "linear_solver", SCS.IndirectSolver)

Thank you so much for your warm welcome and helpful suggestions @odow . The matrix should be sparse. I did as you suggested, and the SCS.IndirectSolver has been running for more than three days, longer than the time SCS.DirectSolver broke down, but still has not got a result. When I get the result, I will give the feedback here. Thanks again!

How sparse? Do you have the output of the SCS log? How many nonzeros?

@odow Do you mean the log created by set_optimizer_attributes(model, "write_data_filename" => "scs_prob.dat") by “SCS log”? And I’m new to optimization and Julia and not sure how to find the number of nonzeros… :smiling_face_with_tear:

What is the content that is printed to the screen?

It looks something like

------------------------------------------------------------------
	       SCS v3.2.3 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 3, constraints m: 5
cones: 	  z: primal zero / dual free vars: 2
	  s: psd vars: 3, ssize: 1
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, rho_x: 1.00e-06
	  acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-direct-amd-qdldl
	  nnz(A): 5, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 1.65e+01  1.60e-01  5.09e+01 -2.91e+01  1.00e-01  8.07e-05
    50| 1.74e-08  2.70e-10  4.88e-08 -4.00e+00  1.00e-01  1.45e-04
------------------------------------------------------------------
status:  solved
timings: total: 1.46e-04s = setup: 5.31e-05s + solve: 9.33e-05s
	 lin-sys: 1.06e-05s, cones: 4.86e-05s, accel: 2.50e-06s
------------------------------------------------------------------
objective = -4.000000
------------------------------------------------------------------

Thank you for your reply. The SCS is still running… It has been running for more than three days and has not finished. When it finishes or fails, I will give the feedback here.

However, I do have the results for a smaller-scale version of the same problem

------------------------------------------------------------------
	       SCS v3.2.3 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 18846055, constraints m: 56379024
cones: 	  z: primal zero / dual free vars: 37612021
	  l: linear vars: 2
	  s: psd vars: 18767001, ssize: 1
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 100000, normalize: 1, rho_x: 1.00e-06
	  acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-direct-amd-qdldl
	  nnz(A): 58322374, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 1.00e+00  1.00e+01  3.79e+01  1.89e+01  1.00e-01  3.75e+02 
   250| 4.79e-05  1.45e-04  3.20e-06 -2.69e-01  1.24e+01  4.49e+04 
------------------------------------------------------------------
status:  solved
timings: total: 4.49e+04s = setup: 1.72e+02s + solve: 4.48e+04s
	 lin-sys: 8.53e+02s, cones: 4.35e+04s, accel: 5.90e+01s
------------------------------------------------------------------

Stupid reply from me - have you run ‘top’ or better ‘htop’ on the server which is running the long-running code? Is it behaving as you would expect - ie lots of CPU utilisation?

When SCS.IndirectSolver runs and during the presolver procedure, for most of the time only one core is fully occupied but occasionally htop shows like this


The total number of CPUs is 64.

problem: variables n: 18846055, constraints m: 56379024
cones: z: primal zero / dual free vars: 37612021
l: linear vars: 2
s: psd vars: 18767001, ssize: 1

nnz(A): 58322374

This is a very large PSD problem. As you can see, this one solves in 4e4 seconds, so I assume it’s about the limit of what you can solve in a reasonable amount of time.

As a general rule of thumb, I tend to think of problems with <10^4 variables as easy, 10^5 as getting a bit hard, 10^6 as hard, and 10^7 as very hard. 10^8 as nigh impossible.

Why is the problem so large? What are you modeling? Have you looked at ways to simplify the problem?

1 Like

There could be some redundancy. We are trying to simplify the problem

1 Like