Out-of-memory errors using Ipopt to solve very large-scale trajectory optimization problems

Hi,

I’m trying to use Ipopt via NLPModelsIpopt.jl in order to solve large trajectory optimization problems. It is essentially an optimal control problem, to minimize \int_{0}^{T} g(x, u(t)) \, dt subject to the constraints \dot{x} = f(x, u(t), t) where u(t) is the control to be optimized. I am using a Simpson-Hermite discretization. Due to the partially separable nature of the constraints, I can compute the constraint Jacobian and Hessian-of-Lagrangian in this case by computing the contribution from each timestep and piecing the contributions together manually. Thus, I generate fast code for the timestep derivatives using SymPy’s common subexpression elimination, and then I assemble these into the full sparse Jacobian/Hessian.

The final derivative oracle implementation is reasonably fast, able to fill ~1e7 nonzero Hessian entries in ~0.1s (although not allocation free). If I then use NLPModelsIpopt to solve the actual problem (Ipopt using HSL Ma97 solver), it works well for small problems around ~1e6 nonzero Hessian entries. However, as I scale up to problems with ~5e6 nonzero Hessian entries, Ipopt frequently hangs for extended periods during the solve, and eventually the kernel just kills the process due to out-of-memory error. Given that I need to ultimately be scaling up to problems of size ~1.5e7 nonzero Hessian entries, this is a big issue.

Does anyone have an idea as to what might be going wrong? I’m happy to share my code as needed. Any help would be greatly appreciated!

Thanks!

Do you have a log of the solve? Does ipopt report the same number of nonzeros as you provide?

See also Ipopt solves large-scale nonlinear problems - #3 by ccoffrin

Here are example logs from NLPModels and Ipopt. The slight discrepancy in number of Hessian nonzeros can be chalked up to the fixed variables in the problem – when the fixed variables are substituted many of the Hessian entries become structural zeros. Accounting for this, the number of nonzeros is correct.

julia> vap
VarAssimProblem
  Problem name: Generic
   All variables: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 140042 All constraints: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 120000
            free: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0                 free: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     
           lower: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0                lower: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     
           upper: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0                upper: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     
         low/upp: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 140035         low/upp: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     
           fixed: β–ˆβ‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 7                fixed: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 120000
          infeas: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0               infeas: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     
            nnzh: ( 99.98% sparsity)   1640173          linear: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     
                                                    nonlinear: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 120000
                                                         nnzj: ( 99.99% sparsity)   1600000

  Counters:
             obj: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0                 grad: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0                 cons: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     
        cons_lin: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0             cons_nln: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0                 jcon: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     
           jgrad: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0                  jac: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0              jac_lin: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     
         jac_nln: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0                jprod: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0            jprod_lin: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     
       jprod_nln: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0               jtprod: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0           jtprod_lin: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     
      jtprod_nln: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0                 hess: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0                hprod: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     
           jhess: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0               jhprod: β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹…β‹… 0     


julia> stats = ipopt(vap; output_file = "ipopt.out")

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.10, running with linear solver ma97.

Number of nonzeros in equality constraint Jacobian...:  1460000
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:  1280116

Total number of variables............................:   140035
                     variables with only lower bounds:        0
                variables with lower and upper bounds:   140035
                     variables with only upper bounds:        0
Total number of equality constraints.................:   120000
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  3.3346424e+08 1.01e+04 1.00e+02   0.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  3.1900064e+08 9.78e+03 2.00e+04   1.4 1.22e+03    -  8.09e-03 2.66e-02f  1
   2  3.1810064e+08 9.53e+03 1.57e+04   2.1 2.31e+02   2.0 1.17e-01 2.54e-02f  1
   3  3.1808754e+08 9.53e+03 1.85e+04  -4.4 3.64e+02   1.5 4.95e-02 3.11e-04f  1
   4  3.1798975e+08 9.50e+03 1.85e+04  -4.4 7.81e+02   1.9 1.74e-02 2.59e-03f  1
   5  3.1756847e+08 9.44e+03 2.72e+04   3.6 2.36e+03   1.5 6.13e-02 6.48e-03f  1
   6  3.1755087e+08 9.44e+03 4.77e+04  -3.6 1.02e+03   1.0 3.98e-02 2.60e-04f  1
   7  3.1427224e+08 9.30e+03 6.80e+04   4.4 2.27e+03   1.4 7.73e-03 2.07e-02f  1
   8  3.1425074e+08 9.29e+03 3.67e+05  -3.4 1.06e+03   1.8 3.82e-02 4.42e-04f  1
   9  3.1130856e+08 8.95e+03 5.21e+08   3.6 7.27e+02   1.4 7.96e-02 3.72e-02f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10  3.1098421e+08 8.91e+03 4.76e+08   4.2 1.51e+03   1.8 1.41e-01 4.22e-03f  1
  11  3.1081839e+08 8.87e+03 4.75e+08   4.5 9.91e+02   3.1 1.68e-03 4.19e-03f  1
  12  3.1082067e+08 8.87e+03 4.75e+08  -2.5 5.92e+03   3.6 7.54e-06 4.19e-05h  1
  13  3.1081862e+08 8.87e+03 4.47e+08   4.5 7.44e+02   4.0 5.98e-02 6.09e-05h  1
  14  3.1047429e+08 8.79e+03 4.10e+08   4.5 9.82e+02   3.5 7.23e-02 8.91e-03f  1
  15  3.1046563e+08 8.79e+03 4.02e+08   4.5 6.38e+02   5.7 5.89e-02 2.57e-04f  1
  16  3.1036491e+08 8.76e+03 6.84e+08   4.5 6.97e+02   6.2 9.47e-02 2.95e-03f  1
  17  3.1007319e+08 8.69e+03 2.66e+09   4.5 8.07e+02   6.6 4.07e-02 8.46e-03h  1
  18  3.1003652e+08 8.68e+03 2.16e+10   4.5 7.43e+02   7.9 7.53e-02 1.06e-03h  1
  19  3.0989937e+08 8.65e+03 1.65e+11   4.5 1.28e+03   7.4 3.80e-02 3.60e-03h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  20  3.0988052e+08 8.64e+03 4.25e+12   4.5 4.64e+02   8.8 1.02e-01 5.23e-04h  1
  21  3.0983646e+08 8.63e+03 9.64e+13   4.5 7.81e+02   9.2 2.49e-02 1.09e-03f  1
  22  3.0981128e+08 8.63e+03 4.60e+14   4.5 7.75e+02  10.5 2.92e-03 6.27e-04f  1
  23  3.0950517e+08 8.56e+03 1.27e+15   4.5 8.43e+02  10.0 2.06e-02 7.65e-03f  1
  24  3.0950279e+08 8.56e+03 1.64e+15   4.5 6.98e+02  12.3 1.37e-04 6.30e-05h  1
  25r 3.0950279e+08 8.56e+03 9.53e+02   4.5 0.00e+00  12.7 0.00e+00 3.38e-07R  2
  26r 3.0814249e+08 5.35e+03 3.80e+05   3.3 2.89e+03    -  4.70e-02 6.98e-02f  1
  27  3.0739749e+08 5.35e+03 9.72e+01   0.8 1.07e+03    -  2.89e-03 1.39e-03f  1
  28  3.0545202e+08 5.33e+03 1.23e+03   1.1 1.45e+03    -  1.22e-02 3.59e-03f  1
  29  3.0440703e+08 5.32e+03 6.76e+04   2.0 1.54e+03    -  6.09e-02 1.82e-03f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  30  3.0440616e+08 5.32e+03 8.65e+09   2.9 2.31e+02  12.2 4.04e-03 2.24e-05h  1
  31r 3.0440616e+08 5.32e+03 9.99e+02   2.7 0.00e+00  11.7 0.00e+00 1.13e-07R  2
  32r 3.0232750e+08 4.69e+03 3.81e+03   2.3 8.82e+04    -  2.49e-03 7.95e-04f  1
  33  3.0160485e+08 4.69e+03 1.80e+05   3.3 2.28e+04    -  1.24e-02 5.78e-04f  1
  34  2.8471940e+08 4.54e+03 1.19e+05   2.6 3.94e+02    -  2.60e-02 3.16e-02f  1
Killed

At the start of the Ipopt solve, RAM usage is at ~20% of maximum (running on a laptop with 16GB RAM total), but over the course of the solve, the RAM usage balloons to ~80% before the kernel kills the process. This suggests that perhaps the garbage collector is not doing its job fast enough? My derivative oracles allocate many array views throughout the solve, and this is the main source of allocations; the timestep derivative computations themselves are nonallocating, but the views are necessary (?) to fill the problem Hessian/Jacobian/variable arrays.

The log looks okay. I had wondered if it was a sparsity issue, but that doesn’t seem to be the case.

It’s hard to say more without a reproducible example.

It sounds like you’ve already written the callback functions, so you could try using the C API of Ipopt directly: GitHub - jump-dev/Ipopt.jl: Julia interface to the Ipopt nonlinear solver.

Alternatively, you could write the problem algebraically using JuMP and let us compute derivatives.

You could try Percival (GitHub - JuliaSmoothOptimizers/Percival.jl: Implementation of a Augmented Lagrangian method). It doesn’t use the explicit Jacobian and Hessian, but you would need to implement the hprod, jprod, and jtprod functions.

Apart from that, you could try profiling the memory of each function to try to reduce. One thing you can do to avoid creating things inside the function calls is to store them inside the VarAssimProblem structurem, e.g. AugLagModel.jl. They won’t get deallocated, but if your issue is allocating too many of them, this might help.

2 Likes