Why doesn't the solver reach the optimal solution?

Hello, I am working on improving a formulation of an optimization problem with JuMP and have encountered an issue that has been blocking me for several days. After extensive debugging and manual verification, I am reasonably confident that the formulation itself is correct. I now suspect the issue may be numerical, but I am unsure.

Below is the REPL output from three runs:

  • “Without DD”: the base formulation, which finds the optimal solution.

  • “With DD”: the formulation augmented with additional constraints intended to reduce the search space.

  • “With DD (optimal w start values)”: same as above, but initialized with optimal values for a subset of variables w, obtained from the “without DD” run.

In the first “with DD” run, the solver finds a worse solution (this is a maximization problem), which could suggest that the added constraints are invalid. However, when providing the optimal w values as a warm start, the solver reaches a better solution. This seems to rule out a feasibility issue.

My current suspicion is that the issue is related to the lazy constraints callback. The added constraints are indicator constraints modeled using a big-M formulation:

z \leq c^\top y^{(j)} + (1-\lambda_j) M, \forall j
d(w)^\top y \geq d(w)^\top y^{(j)} + \epsilon - \lambda_j M, \forall j

Here, d(w)^\top y is linearized internally, and z is the epigraph variable for the objective. During branch-and-bound, the callback identifies solutions y^{(j)} that fail a robustness check at integer nodes and adds these constraints in a Kelley cutting-plane style.

The binary variables $\lambda$​ act as indicators: they activate one constraint or the other depending on whether a strict inequality condition is satisfied.

Given the use of both a large M and a small \epsilon (10^{-4}), I suspect that numerical issues related to this formulation may be causing the behavior I observe, but Gurobi’s output is not flagging any.

Any suggestions on how to diagnose or mitigate this would be greatly appreciated. I am running out of ideas.

=== Without DD ===
Set parameter OutputFlag to value 1
Predicted feature coefficients: [0.33, 0.66, 0.05, -0.04, 0.64]
Set parameter OutputFlag to value 1
Set parameter LazyConstraints to value 1
Gurobi Optimizer version 13.0.2 build v13.0.2rc1 (win64 - Windows 11+.0 (26200.2))

CPU model: Intel(R) Core(TM) Ultra 7 155U, instruction set [SSE2|AVX|AVX2]
Thread count: 12 physical cores, 14 logical processors, using up to 14 threads

Non-default parameters:
TimeLimit  30
LazyConstraints  1

Optimize a model with 195 rows, 395 columns and 930 nonzeros (Max)
Model fingerprint: 0x2f84e83d
Model has 35 linear objective coefficients
Variable types: 40 continuous, 355 integer (355 binary)
Coefficient statistics:
 Matrix range     [5e-03, 6e+00]
 Objective range  [2e-02, 3e-01]
 Bounds range     [8e-01, 3e+00]
 RHS range        [3e+00, 7e+00]

Presolve time: 0.00s
Presolved: 195 rows, 395 columns, 930 nonzeros
Variable types: 40 continuous, 355 integer (355 binary)
Found heuristic solution: objective 2.4156729
Found heuristic solution: objective 3.2507686

Root relaxation: objective 3.883178e+00, 72 iterations, 0.00 seconds (0.00 work units)

   Nodes    |    Current Node    |     Objective Bounds      |     Work
Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

    0     0    3.88318    0    3    3.25077    3.88318  19.5%     -    0s
H    0     0                       3.5070046    3.88318  10.7%     -    0s
H    0     0                       3.5862280    3.88318  8.28%     -    0s
    0     0    3.77162    0    3    3.58623    3.77162  5.17%     -    0s
    0     0    3.62513    0    5    3.58623    3.62513  1.08%     -    0s
    0     0    3.60859    0    3    3.58623    3.60859  0.62%     -    0s
H    0     0                       3.6085581    3.60859  0.00%     -    0s

Cutting planes:
 Cover: 3

Explored 1 nodes (105 simplex iterations) in 0.05 seconds (0.01 work units)
Thread count was 14 (of 14 available processors)

Solution count 5: 3.60856 3.58623 3.507 ... 2.41567

Optimal solution found (tolerance 1.00e-04)
Best objective 3.608558110543e+00, best bound 3.608592531863e+00, gap 0.0010%

User-callback calls 169, time in user-callback 0.04 sec
Finished in 0.088021 seconds
=== With DD, no start_value ===
Set parameter OutputFlag to value 1
Set parameter OutputFlag to value 1
Set parameter LazyConstraints to value 1
Gurobi Optimizer version 13.0.2 build v13.0.2rc1 (win64 - Windows 11+.0 (26200.2))

CPU model: Intel(R) Core(TM) Ultra 7 155U, instruction set [SSE2|AVX|AVX2]
Thread count: 12 physical cores, 14 logical processors, using up to 14 threads

Non-default parameters:
TimeLimit  30
LazyConstraints  1

Optimize a model with 416 rows, 225 columns and 1847 nonzeros (Max)
Model fingerprint: 0x13de0b8e
Model has 35 linear objective coefficients
Variable types: 165 continuous, 60 integer (60 binary)
Coefficient statistics:
 Matrix range     [5e-03, 6e+00]
 Objective range  [2e-02, 3e-01]
 Bounds range     [8e-01, 3e+00]
 RHS range        [3e+00, 7e+00]

Presolve removed 36 rows and 5 columns
Presolve time: 0.00s
Presolved: 380 rows, 220 columns, 1758 nonzeros
Variable types: 160 continuous, 60 integer (60 binary)

Root relaxation: objective 3.011084e+00, 236 iterations, 0.00 seconds (0.00 work units)

   Nodes    |    Current Node    |     Objective Bounds      |     Work
Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

    0     0    3.01108    0   10          -    3.01108      -     -    0s
    0     0    3.00233    0   15          -    3.00233      -     -    0s
H    0     0                       2.7511007    3.00099  9.08%     -    0s
    0     0    3.00099    0   16    2.75110    3.00099  9.08%     -    0s
    0     0    3.00099    0   15    2.75110    3.00099  9.08%     -    0s
    0     0    3.00043    0   16    2.75110    3.00043  9.06%     -    0s
    0     0    2.99906    0   18    2.75110    2.99906  9.01%     -    0s
    0     0    2.99865    0   20    2.75110    2.99865  9.00%     -    0s
    0     0    2.99858    0   17    2.75110    2.99858  9.00%     -    0s
    0     0    2.99831    0   20    2.75110    2.99831  8.99%     -    0s
    0     0    2.99798    0   20    2.75110    2.99798  8.97%     -    0s
    0     0    2.99796    0   20    2.75110    2.99796  8.97%     -    0s
    0     2    2.99796    0   20    2.75110    2.99796  8.97%     -    0s
H   29    35                       2.8368077    2.92023  2.94%  20.7    0s
*   72     6              12       2.8764041    2.88488  0.29%  12.8    0s

Cutting planes:
 Gomory: 3
 Cover: 10
 MIR: 5
 RLT: 1

Explored 113 nodes (1427 simplex iterations) in 0.13 seconds (0.07 work units)
Thread count was 14 (of 14 available processors)

Solution count 3: 2.8764 2.83681 2.7511 

Optimal solution found (tolerance 1.00e-04)
Best objective 2.876404113481e+00, best bound 2.876404113481e+00, gap 0.0000%

User-callback calls 512, time in user-callback 0.03 sec
Finished in 0.1344096 seconds
=== With DD, optimal w start_value ===
Set parameter OutputFlag to value 1
Set parameter OutputFlag to value 1
Set parameter LazyConstraints to value 1
Gurobi Optimizer version 13.0.2 build v13.0.2rc1 (win64 - Windows 11+.0 (26200.2))

CPU model: Intel(R) Core(TM) Ultra 7 155U, instruction set [SSE2|AVX|AVX2]
Thread count: 12 physical cores, 14 logical processors, using up to 14 threads

Non-default parameters:
TimeLimit  30
LazyConstraints  1

Optimize a model with 416 rows, 225 columns and 1847 nonzeros (Max)
Model fingerprint: 0x41b88368
Model has 35 linear objective coefficients
Variable types: 165 continuous, 60 integer (60 binary)
Coefficient statistics:
 Matrix range     [5e-03, 6e+00]
 Objective range  [2e-02, 3e-01]
 Bounds range     [8e-01, 3e+00]
 RHS range        [3e+00, 7e+00]

User MIP start produced solution with objective 3.69862 (0.35s)
Loaded user MIP start with objective 3.69862

Presolve removed 36 rows and 5 columns
Presolve time: 0.00s
Presolved: 380 rows, 220 columns, 1758 nonzeros
Variable types: 160 continuous, 60 integer (60 binary)

Root relaxation: objective 3.883014e+00, 229 iterations, 0.00 seconds (0.00 work units)

   Nodes    |    Current Node    |     Objective Bounds      |     Work
Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

    0     0    3.88301    0    3    3.69862    3.88301  4.99%     -    0s
    0     0    3.73417    0    3    3.69862    3.73417  0.96%     -    0s
    0     0    3.71520    0    2    3.69862    3.71520  0.45%     -    0s

Cutting planes:
 Cover: 2
 RLT: 1

Explored 1 nodes (240 simplex iterations) in 0.36 seconds (0.01 work units)
Thread count was 14 (of 14 available processors)

Solution count 1: 3.69862 

Optimal solution found (tolerance 1.00e-04)
Best objective 3.698624979138e+00, best bound 3.698659508269e+00, gap 0.0009%

User-callback calls 212, time in user-callback 0.35 sec
Finished in 0.7573941 seconds

I don’t know why, but you’re definitely solving different models in each case.

The first fingerprint should be different from the second and third ones, right ?

But yes, the only difference between 2 and 3 is whether this line is commented out :
JuMP.set_start_value.(master_model[:w], collect(Main.w_opt))

I also just noticed that the objective value of run 3 is in fact higher than the base model.

I’ll add that the algorithm used to converge to the same solution with or without the DD cuts when it was implemented as an iterative algorithm, where the lazy constraints were added after solving the master problem using the indicator syntax $\lambda --> {const}, as opposed to the current callback implementation.

I’m getting truly mad here. Here are some additional tests I made

Solving the linear relaxation of the two models without the callback (that’s the only change: no callback is added to the model and integrality is relaxed) gives the same objective value for both. The same as the no-DD root relaxation (3.88…).

So I figured that perhaps the presence of the callback changes something at the root node presolve. Yet, when the callback is present and the Presolve and Cuts attributes of Gurobi are set to 0, the root relaxations no longer match: 3.88 for the no-DD model, 3.01 for the DD model.

This doesn’t make any sense to me. The presence of a callback shouldn’t change the root relaxation value. Right ? Unless the callback is called at the root node? The DD constraints do make the root relaxation integer.