Don't understand the results of cuOpt

julia> import JuMP, cuOpt

julia> model = JuMP.Model();

julia> JuMP.@variable(model, x >= 0, Int);

julia> JuMP.@variable(model, 0 <= y <= 3);

julia> JuMP.@objective(model, Min, 12x + 20y);

julia> JuMP.@constraint(model, 6x + 8y >= 100);

julia> JuMP.@constraint(model, 7x + 12y >= 120);

julia> JuMP.set_optimizer(model, cuOpt.Optimizer);

julia> JuMP.set_silent(model)

julia> JuMP.optimize!(model)
Setting parameter log_to_console to false
Setting parameter log_file to 

julia> JuMP.solution_summary(model)
solution_summary(; result = 1, verbose = false)
├ solver_name          : cuOpt
├ Termination
│ ├ termination_status : OPTIMAL
│ ├ result_count       : 1
│ ├ raw_status         : cuOptModelStatusOptimal
│ └ objective_bound    : 2.05000e+02
├ Solution (result = 1)
│ ├ primal_status        : FEASIBLE_POINT
│ ├ dual_status          : FEASIBLE_POINT
│ ├ objective_value      : 2.05000e+02
│ └ dual_objective_value : NaN
└ Work counters
  └ solve_time (sec)   : 1.32267e-01

Why is the dual_status being FEASIBLE_POINT (this is a MIP)? Particularly considering the NaN therein.
And, “Setting parameter log_file to” seems to be incomplete.

And, the occuring place of “Setting parameter …” is not quite right, given that I’ve silenced the model beforehand. The command “optimize!” should have no loggings to the console, I think.


One more issue: if I call the optimize!(model) for the 2nd time, immediately after the 1st time, the logging is exactly the same, does this implies that it is re-solved from scratch? (Given my experience from Gurobi).

Here is an other question: why cuOpt still uses concurrent algorithm (see :orange_circle: places), given num_of_threads = 1?

julia> JuMP.MOI.set(model, JuMP.MOI.NumberOfThreads(), 1)

julia> JuMP.optimize!(model)
Setting parameter num_cpu_threads to 1
cuOpt version: 25.10.1, git hash: 876fcfc, host arch: x86_64, device archs: 70-real,75-real,80-real,86-real,90a-real,100f-real,120a-real,120
CPU: AMD EPYC 7763 64-Core Processor, threads (physical/logical): 128/256, RAM: 254.29 GiB
CUDA 12.9, device: NVIDIA RTX A6000 (ID 0), VRAM: 44.43 GiB
CUDA device UUID: 5effffff83ffffffe3ffffffd5-3610-ffff

Solving a problem with 2 constraints, 2 variables (0 integers), and 4 nonzeros
Problem scaling:
Objective coefficents range:          [1e+01, 2e+01]
Constraint matrix coefficients range: [6e+00, 1e+01]
Constraint rhs / bounds range:        [0e+00, 1e+02]
Variable bounds range:                [0e+00, 3e+00]

Third-party presolve is disabled, skipping
Objective offset 0.000000 scaling_factor 1.000000
Running concurrent 🟠

Dual simplex finished in 0.00 seconds, total time 0.00 🟠
   Iter    Primal Obj.      Dual Obj.    Gap        Primal Res.  Dual Res.   Time
      0 +0.00000000e+00 +0.00000000e+00  0.00e+00   1.56e+02     2.33e+01   0.007s
Barrier finished in 0.01 seconds 🟠
PDLP finished 🟠
Concurrent time:  0.009s, total time 0.010s
Solved with dual simplex
Status: Optimal   Objective: 2.05000000e+02  Iterations: 2  Time: 0.010s

Another issue: FEASIBILITY_SENSE is not supported?

julia> import JuMP, cuOpt

julia> model = JuMP.Model(cuOpt.Optimizer);

julia> JuMP.@variable(model, x >= 1);

julia> JuMP.optimize!(model)
ERROR: MathOptInterface.UnsupportedAttribute{MathOptInterface.ObjectiveSense}: Attribute MathOptInterface.ObjectiveSense() is not supported by the model.

Okay, ^C interrupt is not supported currently. (The following, presumably is the PDLP/PDHG algorithm that runs on GPU, I’m sorry to test you with a hard instance at the beginning that I found 2 days ago when I worked with Gurobi. I think the algorithm is good.)

 184000 +0.00000000e+00 +4.17350822e+12  4.17e+12   8.89e+01     5.41e+02   687.377s
 185000 +0.00000000e+00 +4.17351235e+12  4.17e+12   8.89e+01     5.41e+02   690.872s
^C 186000 +0.00000000e+00 +4.17351648e+12  4.17e+12   8.89e+01     5.41e+02   694.364s
 187000 +0.00000000e+00 +4.17352062e+12  4.17e+12   8.89e+01     5.41e+02   697.859s
 188000 +0.00000000e+00 +4.17352475e+12  4.17e+12   8.89e+01     5.41e+02   701.354s
 189000 +0.00000000e+00 +4.17352888e+12  4.17e+12   8.89e+01     5.41e+02   704.848s
 190000 +0.00000000e+00 +4.17353302e+12  4.17e+12   8.89e+01     5.41e+02   708.341s
 191000 +0.00000000e+00 +4.17353715e+12  4.17e+12   8.89e+01     5.41e+02   711.833s
^C 192000 +0.00000000e+00 +4.17354128e+12  4.17e+12   8.89e+01     5.41e+02   715.325s
^C^C^C 193000 +0.00000000e+00 +4.17354542e+12  4.17e+12   8.89e+01     5.41e+02   718.816s
 194000 +0.00000000e+00 +4.17354955e+12  4.17e+12   8.89e+01     5.41e+02   722.306s
 195000 +0.00000000e+00 +4.17355369e+12  4.17e+12   8.89e+01     5.41e+02   725.796s
^C^C^C^C^CWARNING: Force throwing a SIGINT
ERROR: InterruptException:
Stacktrace:
 [1] cuOptSolve
   @ ~/.julia/packages/cuOpt/dPmC5/src/gen/libcuopt.jl:542 [inlined]
 [2] optimize!(model::cuOpt.Optimizer)
   @ cuOpt ~/.julia/packages/cuOpt/dPmC5/src/MOI_wrapper.jl:1023
 [3] optimize!
   @ ~/.julia/packages/MathOptInterface/zq9bo/src/MathOptInterface.jl:122 [inlined]
 [4] optimize!(m::MathOptInterface.Utilities.CachingOptimizer{cuOpt.Optimizer, MathOptInterface.Utilities.UniversalFallback{MathOptInterface.Utilities.Model{Float64}}})
   @ MathOptInterface.Utilities ~/.julia/packages/MathOptInterface/zq9bo/src/Utilities/cachingoptimizer.jl:370
 [5] optimize!
   @ ~/.julia/packages/MathOptInterface/zq9bo/src/Bridges/bridge_optimizer.jl:367 [inlined]
 [6] optimize!(m::MathOptInterface.Utilities.CachingOptimizer{MathOptInterface.Bridges.LazyBridgeOptimizer{…}, MathOptInterface.Utilities.UniversalFallback{…}})
   @ MathOptInterface.Utilities ~/.julia/packages/MathOptInterface/zq9bo/src/Utilities/cachingoptimizer.jl:379
 [7] optimize!(model::JuMP.Model; ignore_optimize_hook::Bool, _differentiation_backend::MathOptInterface.Nonlinear.SparseReverseMode, kwargs::@Kwargs{})
   @ JuMP ~/.julia/packages/JuMP/N7h14/src/optimizer_interface.jl:609
 [8] optimize!(model::JuMP.Model)
   @ JuMP ~/.julia/packages/JuMP/N7h14/src/optimizer_interface.jl:560
 [9] top-level scope
   @ REPL[75]:1
Some type information was truncated. Use `show(err)` to see complete types.

This is the dual simplex’s logging, I think it’s having some difficulty

Solving_an_MIP
cuOpt version: 25.10.1, git hash: 876fcfc, host arch: x86_64, device archs: 70-real,75-real,80-real,86-real,90a-real,100f-real,120a-real,120
CPU: AMD EPYC 7763 64-Core Processor, threads (physical/logical): 128/256, RAM: 241.07 GiB
CUDA 12.9, device: NVIDIA RTX A6000 (ID 0), VRAM: 44.43 GiB
CUDA device UUID: 5effffff83ffffffe3ffffffd5-3610-ffff

Solving a problem with 5151290 constraints, 7825201 variables (2967600 integers), and 22917816 nonzeros
Problem scaling:
Objective coefficents range:          [1e+00, 1e+00]
Constraint matrix coefficients range: [9e-01, 2e+01]
Constraint rhs / bounds range:        [0e+00, 2e+02]
Variable bounds range:                [1e+00, 6e+01]

Original problem: 5151290 constraints, 7825201 variables, 22917816 nonzeros
Calling Papilo presolver
Presolve status: reduced the problem
Presolve removed: 115627 constraints, 1012673 variables, 2079106 nonzeros
Presolved problem: 5035663 constraints, 6812528 variables, 20838710 nonzeros
Papilo presolve time: 74.654472
Objective offset 0.000000 scaling_factor 1.000000
Running presolve!
Unused variables detected, eliminating them! Unused var count 1130
After trivial presolve: 5035344 constraints, 6811398 variables, objective offset 0.000000.
Using 255 CPU threads for B&B
Solving LP root relaxation
Scaling matrix. Maximum column norm 1.708977e+00
Dual Simplex Phase 1
Dual feasible solution found.
Dual Simplex Phase 2
 Iter     Objective           Num Inf.  Sum Inf.     Perturb  Time
    1 +2.0913494517664908e+05 1495302 1.94174718e+07 0.00e+00 13.31
 1000 +2.0913494517664908e+05 1494963 1.92560343e+07 0.00e+00 23.34
 2000 +2.0913494517664908e+05 1494614 1.91146367e+07 0.00e+00 33.33
 3000 +2.0913494517664908e+05 1494331 1.89908115e+07 0.00e+00 43.28
 4000 +2.0913494517664908e+05 1494008 1.88967770e+07 0.00e+00 55.98
 5000 +2.0913494517664908e+05 1493670 1.87939717e+07 0.00e+00 69.23
 6000 +2.0913494517664908e+05 1493300 1.86788550e+07 0.00e+00 78.97
 7000 +2.0913494517664908e+05 1492960 1.85976379e+07 0.00e+00 88.65
 8000 +2.0913494517664908e+05 1492664 1.85340391e+07 0.00e+00 101.11
 9000 +2.0913494517664908e+05 1492297 1.84429646e+07 0.00e+00 111.90
10000 +2.0913494517664908e+05 1491980 1.83855940e+07 0.00e+00 121.87
11000 +2.0913494517664908e+05 1491649 1.83121927e+07 0.00e+00 131.41
12000 +2.0913494517664908e+05 1491331 1.82492266e+07 0.00e+00 141.36
13000 +2.0913494517664908e+05 1491041 1.82330942e+07 0.00e+00 152.92
14000 +2.0913494517664908e+05 1490679 1.81204456e+07 0.00e+00 162.70
15000 +2.0913494517664908e+05 1490330 1.80459244e+07 0.00e+00 172.44
16000 +2.0913494517664908e+05 1490004 1.79864777e+07 0.00e+00 182.04
17000 +2.0913494517664908e+05 1489665 1.79471698e+07 0.00e+00 191.89
18000 +2.0913494517664908e+05 1489343 1.79209297e+07 0.00e+00 201.86
19000 +2.0913494517664908e+05 1489084 1.78946283e+07 0.00e+00 211.66
20000 +2.0913494517664908e+05 1488772 1.78676702e+07 0.00e+00 221.43
21000 +2.0913494517664908e+05 1488439 1.78218220e+07 0.00e+00 231.05
22000 +2.0913494517664908e+05 1488122 1.77605387e+07 0.00e+00 240.87
23000 +2.0913494517664908e+05 1487826 1.77420434e+07 0.00e+00 250.68
24000 +2.0913494517664908e+05 1487543 1.77258576e+07 0.00e+00 260.51
25000 +2.0913494517664908e+05 1487225 1.76601228e+07 0.00e+00 270.33
26000 +2.0913494517664908e+05 1486924 1.76055733e+07 0.00e+00 279.86
27000 +2.0913494517664908e+05 1486648 1.76000950e+07 0.00e+00 289.72
28000 +2.0913494517664908e+05 1486330 1.75823480e+07 0.00e+00 299.63
29000 +2.0913494517664908e+05 1486057 1.75265363e+07 0.00e+00 309.47
30000 +2.0913494517664908e+05 1485751 1.74864324e+07 0.00e+00 319.37
31000 +2.0913494517664908e+05 1485427 1.74810450e+07 0.00e+00 328.97
32000 +2.0913494517664908e+05 1485136 1.74718256e+07 0.00e+00 338.85
33000 +2.0913494517664908e+05 1484871 1.74719762e+07 0.00e+00 348.72
34000 +2.0913494517664908e+05 1484559 1.74443393e+07 0.00e+00 358.64
35000 +2.0913494517664908e+05 1484248 1.74177887e+07 0.00e+00 368.51
36000 +2.0913494517664908e+05 1483896 1.73945423e+07 0.00e+00 378.05
37000 +2.0913494517664908e+05 1483556 1.73798841e+07 0.00e+00 387.91
38000 +2.0913494517664908e+05 1483294 1.73668091e+07 0.00e+00 397.82
39000 +2.0913494517664908e+05 1483009 1.73522648e+07 0.00e+00 407.72
40000 +2.0913494517664908e+05 1482724 1.73495800e+07 0.00e+00 417.61
41000 +2.0913494517664908e+05 1482394 1.73035306e+07 0.00e+00 427.20
42000 +2.0913494517664908e+05 1482136 1.72653531e+07 0.00e+00 437.09
43000 +2.0913494517664908e+05 1481866 1.72491970e+07 0.00e+00 447.01
44000 +2.0913494517664908e+05 1481567 1.72209906e+07 0.00e+00 456.90
45000 +2.0913494517664908e+05 1481313 1.72257954e+07 0.00e+00 466.82
46000 +2.0913494517664908e+05 1480919 1.72723173e+07 0.00e+00 476.38
47000 +2.0913494517664908e+05 1480636 1.72955806e+07 0.00e+00 486.35
48000 +2.0913494517664908e+05 1480424 1.72939608e+07 0.00e+00 496.26
49000 +2.0913494517664908e+05 1480110 1.73177117e+07 0.00e+00 506.15

Feel free to open GitHub issues: GitHub · Where software is built

The wrapper is still new (and even though it’s in jump-dev, it’s developed by NVIDIA).

Since I was intrigued about the new PDHG algorithm run on GPUs, I had a talk with some chatbot, who told me that the PDHG algorithm may have difficulties in converging to high accuracy solutions. It appears that I can observe this behavior by inspecting the logging.

My short conclusion is

  • For realistically large-scaled LPs , the current best algorithm is still the Barrier method (Method=2).

Here is a comparative test, where I solve an LP with Barrier and PDHG respectively—The PDHG takes 2.5 times longer.

PDHG Logging

Gurobi Optimizer version 13.0.0 build v13.0.0rc1 (linux64gpu - "Debian GNU/Linux 12 (bookworm)")

CPU model: AMD EPYC 7763 64-Core Processor, instruction set [SSE2|AVX|AVX2]
Thread count: 128 physical cores, 256 logical processors, using up to 1 threads

GPU model: NVIDIA RTX A6000, CUDA compute version 8.6, NVIDIA driver compatible with CUDA version 12

Non-default parameters:
Method  6
PDHGGPU  1
Threads  1

Optimize a model with 5151292 rows, 7825249 columns and 22917912 nonzeros (Min)
Model fingerprint: 0xf87d8e3a
Model has 1 linear objective coefficients
Coefficient statistics:
  Matrix range     [9e-01, 2e+01]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 6e+01]
  RHS range        [1e+00, 2e+02]
Presolve removed 1849487 rows and 1574270 columns
Presolve time: 29.36s
Presolved: 3301805 rows, 6250979 columns, 20473360 nonzeros

Start PDHG on GPU

                       Objective                Residual
     Iter       Primal          Dual         Primal    Dual     Compl    Time
        0   0.00000000e+00  0.00000000e+00  1.63e+05 1.00e+00  0.00e+00   32s
     1351   3.45645074e+05  1.36432249e+05  2.37e+02 3.05e-01  1.87e-02   35s
    27204   3.23363874e+05  3.28064453e+05  1.43e+00 1.37e-02  8.32e-06  105s
   138604   3.23803145e+05  3.23951709e+05  4.89e-03 2.64e-03  1.00e-07  415s
   140404   3.23803152e+05  3.23954701e+05  4.89e-03 2.64e-03  9.61e-08  420s
   268204   3.23803272e+05  3.23796486e+05  7.44e-04 1.00e-04  4.41e-08  775s
   270004   3.23803269e+05  3.23804678e+05  8.03e-04 9.89e-05  4.23e-08  780s
   343804   3.23803346e+05  3.23803199e+05  4.61e-03 9.95e-07  2.07e-09  985s
   345204   3.23803325e+05  3.23803190e+05  4.69e-03 9.59e-07  1.87e-09  989s

PDHG solved model in 345204 iterations and 989.17 seconds (73059.61 work units)
Optimal objective 3.23803325e+05

Barrier Logging

Ordering time: 4.16s

Barrier statistics:
 AA' NZ     : 1.277e+07
 Factor NZ  : 5.351e+07 (roughly 4.0 GB of memory)
 Factor Ops : 1.352e+09 (less than 1 second per iteration)
 Threads    : 1

                  Objective                Residual
Iter       Primal          Dual         Primal    Dual     Compl     Time
   0   9.80370103e+04 -4.66836255e+06  3.01e+06 0.00e+00  2.06e+01    42s
   1   9.80108593e+04 -4.46319395e+06  1.53e+06 4.10e-02  1.06e+01    45s
   2   9.82916048e+04 -4.07901327e+06  5.22e+05 1.88e-02  3.79e+00    47s
   3   1.76165956e+05 -2.95543088e+06  2.47e+05 2.22e-16  1.76e+00    50s
   4   2.51120530e+05 -1.67378137e+06  9.15e+04 3.05e-16  6.64e-01    53s
  29   3.25159236e+05  3.15877438e+05  1.37e+02 7.55e-15  1.01e-03   133s
  53   3.23997672e+05  3.23035836e+05  1.54e+01 1.48e-11  1.06e-04   214s
  54   3.23983041e+05  3.23085250e+05  1.41e+01 2.12e-11  9.88e-05   217s
 104   3.23803249e+05  3.23803248e+05  2.21e-05 2.15e-13  1.07e-10   395s

Barrier solved model in 104 iterations and 395.09 seconds (403.53 work units)
Optimal objective 3.23803249e+05

Besides, running a single PDHG algorithm would occupy one GPU device 100% completely, whereas running a single Barrier algorithm would only require one CPU thread (logical processor).

Is there a question here?
The literature on PDHG and its GPU variants has repeatedly demonstrated their relevance, but it may not be appropriate for the specific instance you tried it on.