90% of Ipopt run time is spent in Hessian/constraints evaluation, very little time in linear solver. Is it normal?

Hello,
I have been trying to improve my solve time for nonlinear optimal control problem , it is a tool for solving minimum lap time for a race car GitHub - ceserik/SLapSim.jl: Scalable laptime simulator · GitHub. I am using JuMP + IPOPT + MUMPS, I have got licence for HSL and pardiso solver, but then I realized most of the time is not spent solving the linear system of equations by linear solver, but computing the hessian of my function. So the different linear solvers had no impact on time of solving.

This is a snippet from my code, where I enforce gauss-lobatto quadrature for states along time.

#create constraints using butcher tableau of lobattoIIIA
        segment_start_idx = 1
        # evaluate the car ODE function and adds state specific constraints for first point
        f(X[segment_start_idx, :], U[segment_start_idx, :], s_all[segment_start_idx], model) 
        for segment = 1:number_of_segments
            F = [f(X[segment_start_idx+col-1, :], U[segment_start_idx+col-1, :], s_all[segment_start_idx+col-1], nothing) for col in 1:stages]
            for stage = 2:stages
                fxSum = zeros(NonlinearExpr, nStates)
                for col = 1:stages
                    fxSum += tableau.a[stage, col] * F[col]
                end
                # enforce loabbto quadrature 
                @constraint(model, X[segment_start_idx+stage-1, :] .== X[segment_start_idx, :] + h_all[segment] * fxSum)
                # evaluate the car ODE function and adds state specific constraints for rest of the points
                f(X[segment_start_idx+stage-1, :], U[segment_start_idx+stage-1, :], s_all[segment_start_idx+stage-1], model) 
            end
            segment_start_idx = segment_start_idx + stages - 1
        end

This snippet is of tire function used by car, it constraints the tire force according to complex tire model using trigionometry functions

    function compute(inTorque::carVar, optiModel::Union{JuMP.Model,Nothing}=nothing)
        slipAngle.value = -atan(velocity.value[2], velocity.value[1])
        α = slipAngle.value * 1
        D = forces.value[3] * frictionCoefficient.value
        #Pacejka tire model
        forces.value[2] = D * sin(C * atan(B * α - E * (B * α - atan(B * α))))
        forces.value[1] = inTorque/radius.value + brakingForce.value + forces.value[3] * rollingResistance.value
    end

    function tireConstraints(model=nothing)
        μFz = frictionCoefficient.value * forces.value[3]
        fx = forces.value[1] / μFz
        fy = forces.value[2] / μFz
        @constraint(model, fx^2 + fy^2  <= 1.1^2)
    end

Time statistics of solver

Number of Iterations....: 318

                                   (scaled)                 (unscaled)
Objective...............:   4.3202620034505081e+01    8.6405240069010162e+01
Dual infeasibility......:   4.4689843001849362e-09    8.9379686003698724e-09
Constraint violation....:   2.6980963047834181e-09    1.1728004665201297e-06
Variable bound violation:   9.2138580376488477e-09    9.2138580376488477e-09
Complementarity.........:   1.0103392529261279e-09    2.0206785058522557e-09
Overall NLP error.......:   4.4689843001849362e-09    1.1728004665201297e-06


Number of objective function evaluations             = 391
Number of objective gradient evaluations             = 319
Number of equality constraint evaluations            = 391
Number of inequality constraint evaluations          = 391
Number of equality constraint Jacobian evaluations   = 319
Number of inequality constraint Jacobian evaluations = 319
Number of Lagrangian Hessian evaluations             = 318
Total seconds in IPOPT (w/o function evaluations)    =     10.715
Total seconds in NLP function evaluations            =    166.565


Timing Statistics:

OverallAlgorithm....................:    235.910 (sys:      1.131 wall:    177.280)
 PrintProblemStatistics.............:      0.000 (sys:      0.000 wall:      0.000)
 InitializeIterates.................:      0.287 (sys:      0.004 wall:      0.291)
 UpdateHessian......................:    121.763 (sys:      0.059 wall:    121.749)
 OutputIteration....................:      0.015 (sys:      0.002 wall:      0.012)
 UpdateBarrierParameter.............:      0.138 (sys:      0.004 wall:      0.084)
 ComputeSearchDirection.............:     38.595 (sys:      0.981 wall:      9.960)
 ComputeAcceptableTrialPoint........:     67.673 (sys:      0.056 wall:     37.725)
 AcceptTrialPoint...................:      0.030 (sys:      0.000 wall:      0.030)
 CheckConvergence...................:      7.400 (sys:      0.025 wall:      7.421)
PDSystemSolverTotal.................:     38.502 (sys:      0.979 wall:      9.942)
 PDSystemSolverSolveOnce............:     36.372 (sys:      0.966 wall:      9.413)
 ComputeResiduals...................:      2.013 (sys:      0.010 wall:      0.495)
 StdAugSystemSolverMultiSolve.......:     36.175 (sys:      0.962 wall:      9.387)
 LinearSystemScaling................:      0.000 (sys:      0.000 wall:      0.000)
 LinearSystemSymbolicFactorization..:      0.031 (sys:      0.002 wall:      0.033)
 LinearSystemFactorization..........:      0.000 (sys:      0.000 wall:      0.000)
 LinearSystemBackSolve..............:      4.002 (sys:      0.006 wall:      1.032)
 LinearSystemStructureConverter.....:      0.000 (sys:      0.000 wall:      0.000)
  LinearSystemStructureConverterInit:      0.000 (sys:      0.000 wall:      0.000)
QualityFunctionSearch...............:      0.000 (sys:      0.000 wall:      0.000)
TryCorrector........................:      0.000 (sys:      0.000 wall:      0.000)
Task1...............................:      0.000 (sys:      0.000 wall:      0.000)
Task2...............................:      0.000 (sys:      0.000 wall:      0.000)
Task3...............................:      0.000 (sys:      0.000 wall:      0.000)
Task4...............................:      0.000 (sys:      0.000 wall:      0.000)
Task5...............................:      0.000 (sys:      0.000 wall:      0.000)
Task6...............................:      0.000 (sys:      0.000 wall:      0.000)
Function Evaluations................:    196.333 (sys:      0.134 wall:    166.565)
 Objective function.................:      0.006 (sys:      0.000 wall:      0.006)
 Objective function gradient........:      0.018 (sys:      0.001 wall:      0.018)
 Equality constraints...............:     67.240 (sys:      0.054 wall:     37.462)
 Inequality constraints.............:      0.137 (sys:      0.000 wall:      0.138)
 Equality constraint Jacobian.......:      0.039 (sys:      0.000 wall:      0.039)
 Inequality constraint Jacobian.....:      7.175 (sys:      0.020 wall:      7.193)
 Lagrangian Hessian.................:    121.718 (sys:      0.058 wall:    121.708)

EXIT: Optimal Solution Found.

I want to ask if it is normal that 90% of ipopt run time is spent in function evaluation( Hessian and equality constraints in the bottom) and very little time solving the linear system by HSL/MUMPS/PARDISO. Does this hint I formulated the model improperly?

If it is useful for somebody hessian and jacobian look like this


If you have lots of parameters, computing the exact Hessian (which is the default in Ipopt, I believe) is impractical — you’ll want to change it to use a limited-memory approximation.

2 Likes

Thanks for the answer, by parameters do you mean these Jump parameters. Right now I am computing the exact hessian, I have tried the approximation, but the convergence-solving time was a lot worse.

I have only 10 of jump parameters. After removing them the solve time improved but the solver still spend 90% of time with hessian and constraint evaluation. Or do you mean variables, i have this many variables:

Variables: 7784, Constraints: 25653
This is Ipopt version 3.14.19, running with linear solver MUMPS 5.8.2.

Number of nonzeros in equality constraint Jacobian...:   100116
Number of nonzeros in inequality constraint Jacobian.:    23308
Number of nonzeros in Lagrangian Hessian.............:   494439

Total number of variables............................:     7770
                     variables with only lower bounds:        0
                variables with lower and upper bounds:     7770
                     variables with only upper bounds:        0
Total number of equality constraints.................:     5438
Total number of inequality constraints...............:     4661
        inequality constraints with only lower bounds:     1553
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:     3108

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  1.9800000e+02 3.14e+02 8.14e-01   0.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  1.9674790e+02 2.85e+02 4.19e+01  -0.7 4.23e-01    -  4.87e-02 6.86e-01f  1
   2  1.9763018e+02 2.48e+02 3.11e+02  -0.9 4.52e-01   2.0 7.18e-01 1.32e-01h  1
   3  1.9949705e+02 9.78e+01 1.76e+02  -1.0 1.95e-01   1.5 1.00e+00 6.34e-01h  1
...
...
...
Number of Iterations....: 388

                                   (scaled)                 (unscaled)
Objective...............:   4.3596959242692527e+01    8.7193918485385055e+01
Dual infeasibility......:   1.1608680775887709e-09    2.3217361551775417e-09
Constraint violation....:   5.4926496038044396e-08    2.3883271637714643e-05
Variable bound violation:   9.9912477527497323e-09    9.9912477527497323e-09
Complementarity.........:   7.9815827401899534e-11    1.5963165480379907e-10
Overall NLP error.......:   5.4926496038044396e-08    2.3883271637714643e-05


Number of objective function evaluations             = 419
Number of objective gradient evaluations             = 381
Number of equality constraint evaluations            = 419
Number of inequality constraint evaluations          = 419
Number of equality constraint Jacobian evaluations   = 390
Number of inequality constraint Jacobian evaluations = 390
Number of Lagrangian Hessian evaluations             = 388
Total seconds in IPOPT (w/o function evaluations)    =     16.473
Total seconds in NLP function evaluations            =    144.390


Timing Statistics:

OverallAlgorithm....................:    231.715 (sys:      1.799 wall:    160.863)
 PrintProblemStatistics.............:      0.000 (sys:      0.000 wall:      0.000)
 InitializeIterates.................:      0.220 (sys:      0.001 wall:      0.221)
 UpdateHessian......................:    104.623 (sys:      0.071 wall:    104.688)
 OutputIteration....................:      0.020 (sys:      0.004 wall:      0.015)
 UpdateBarrierParameter.............:     40.438 (sys:      1.142 wall:     10.588)
 ComputeSearchDirection.............:     18.471 (sys:      0.408 wall:      4.834)
 ComputeAcceptableTrialPoint........:     60.512 (sys:      0.149 wall:     33.061)
 AcceptTrialPoint...................:      0.034 (sys:      0.001 wall:      0.035)
 CheckConvergence...................:      7.387 (sys:      0.024 wall:      7.415)
PDSystemSolverTotal.................:     57.435 (sys:      1.460 wall:     15.008)
 PDSystemSolverSolveOnce............:     55.137 (sys:      1.433 wall:     14.387)
 ComputeResiduals...................:      2.111 (sys:      0.016 wall:      0.573)
 StdAugSystemSolverMultiSolve.......:     56.098 (sys:      1.456 wall:     14.655)
 LinearSystemScaling................:      0.000 (sys:      0.000 wall:      0.000)
 LinearSystemSymbolicFactorization..:      0.033 (sys:      0.000 wall:      0.033)
 LinearSystemFactorization..........:      0.000 (sys:      0.000 wall:      0.000)
 LinearSystemBackSolve..............:      7.138 (sys:      0.029 wall:      1.850)
 LinearSystemStructureConverter.....:      0.000 (sys:      0.000 wall:      0.000)
  LinearSystemStructureConverterInit:      0.000 (sys:      0.000 wall:      0.000)
QualityFunctionSearch...............:      1.153 (sys:      0.079 wall:      0.317)
TryCorrector........................:      0.000 (sys:      0.000 wall:      0.000)
Task1...............................:      0.311 (sys:      0.019 wall:      0.086)
Task2...............................:      0.441 (sys:      0.014 wall:      0.117)
Task3...............................:      0.073 (sys:      0.009 wall:      0.017)
Task4...............................:      0.000 (sys:      0.000 wall:      0.000)
Task5...............................:      0.123 (sys:      0.012 wall:      0.040)
Task6...............................:      0.000 (sys:      0.000 wall:      0.000)
Function Evaluations................:    170.455 (sys:      0.200 wall:    144.390)
 Objective function.................:      0.006 (sys:      0.000 wall:      0.006)
 Objective function gradient........:      0.011 (sys:      0.000 wall:      0.012)
 Equality constraints...............:     56.277 (sys:      0.107 wall:     30.109)
 Inequality constraints.............:      0.075 (sys:      0.000 wall:      0.075)
 Equality constraint Jacobian.......:      0.042 (sys:      0.000 wall:      0.042)
 Inequality constraint Jacobian.....:      7.358 (sys:      0.023 wall:      7.386)
 Lagrangian Hessian.................:    106.685 (sys:      0.069 wall:    106.760)

EXIT: Optimal Solution Found.

No, I mean optimization variables, design parameters, decision variables, whatever you want to call them. If you have n decision variables, your Hessian matrix is n \times n, and computing the Hessian in general requires O(n) times the cost of evaluating your objective function (e.g. by forward over reverse mode) — so as n grows, the Hessian computational cost dominates, and eventually becomes impractical. (Of course, there are special cases where obtaining the Hessian is cheap… I’m speaking of a general nonlinear objective.) That’s why large-scale nonlinear optimization eventually must start using things like low-rank Hessian approximations (quasi-Newton methods), or abandoning the use of the Hessian altogether (first-order methods).

Here, it looks like you have n = 7784, which seems large to be computing explicit Hessians. On the other hand, it looks from your diagram that your Hessian is quite sparse, so potentially you could exploit that.

(I’m just speaking in general… I haven’t deciphered your code fragments to understand the specific problem you are solving, and you haven’t given any high-level mathematical description.)

1 Like

Thank you very much for explaining this, from my understanding JuMP only uses single core to compute the Hessian, I have come across ExaModels.jl + MadNLP. The first line says this

ExaModels.jl employs what we call SIMD abstraction for nonlinear programs (NLPs), which allows for the preservation of the parallelizable structure within the model equations, facilitating efficient, parallel reverse-mode automatic differentiation on the GPU accelerators .

I got access to good GPU, do you think that it could run faster using ExaModels on GPU?

I will try to put here high-level mathematical description.

I think this decribes the best what I am doing, I am not a student in that class. I like their description :D. Is it sufficient?