JuMP, large expression, and hessian "limited-memory" option in Ipopt

Hi all,

I have a question regarding JuMP, which I try to use to solve an optimal control problem. For this problem, I have to generate a quite complicated function using symbolic computations. Then, I build a nonlinear @expression in JuMP using this function, which I need in my optimization problem formulation. It works well but this step may take some time because the generated function may be quite lengthy.

Eventually, I noticed that my hessian is not very sparse: I typically get >400,000 non-zeros elements according to Ipopt logs. In practice, Ipopt iterations can be much faster when using set_attribute(model, “hessian_approximation”, “limited-memory”) and convergence rate seems to improve overall in my case.

I was wondering if there a way to tell JuMP that I don’t want to compute the hessian information, especially if I end up using the “limited-memory” Ipopt option? Could it also speed up the line starting with @expression? (does function tracing mean that AD is fully performed at this stage already?)

Sorry for the possibly naive questions, and thanks for any insights!

1 Like

Hi @niro, welcome to the forum!

I was wondering if there a way to tell JuMP that I don’t want to compute the hessian information

To check if your settings make a difference, you can check the number of functions calls.

With:

using JuMP, Ipopt
function main(method)
    model = Model(Ipopt.Optimizer)
    @variable(model, x)
    @variable(model, y)
    @objective(model, Min, (1 - x)^2 + 100 * (y - x^2)^2)
    set_attribute(model, "hessian_approximation", method)
    optimize!(model)
end

I get

julia> main("exact")
This is Ipopt version 3.14.14, running with linear solver MUMPS 5.6.2.

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        3

... lines omitted ...

Number of objective function evaluations             = 36
Number of objective gradient evaluations             = 15
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 14
Total seconds in IPOPT                               = 0.008

julia> main("limited-memory")
This is Ipopt version 3.14.14, running with linear solver MUMPS 5.6.2.

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

... lines omitted ...

Number of objective function evaluations             = 47
Number of objective gradient evaluations             = 25
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 0
Total seconds in IPOPT                               = 0.043

The key difference is Number of nonzeros in Lagrangian Hessian and Number of Lagrangian Hessian evaluations, which are 0 if you set the limited-memory option. This shows that JuMP is not computing the Hessian, or passing it to Ipopt.

Could it also speed up the line starting with @expression?

Nope

does function tracing mean that AD is fully performed at this stage already

Nope.

It works well but this step may take some time because the generated function may be quite lengthy.

For improving the speed of @expression, we can likely make some improvements. Can you provide a reproducible example of your code?

I really should write a tutorial on optimal control, with some common tricks for performance.

You could take a read of: Help solving: Adding a few equations to ODE slows JuMP >100x

Or look at

2 Likes

Thanks @odow for the answers. This clarifies my understanding.

Yes, I was aware of these optimal control packages in Julia. I tried them and it’s nice to see such developments, but I’m not sure that they would scale much better if the dynamics is a very large expression. I’ve read this post (JuMP: Constraint with a huge expression), where a similar issue is discussed.

I cannot give a MWE now but I can improve the description. Basically I do preliminary calculations using Symbolics.jl, use build_function() before I switch to JuMP to formulate my optimal control problem. In the symbolic part, I essentially compute a Jacobian matrix, and solve a linear system to automatically build the f term in the system \dot{x} = f(x,u). This function f may be a huge expression with dozens of dimensions.

I managed to speed up the @expression line by adding subexpressions and variables in JuMP, but it requires splitting the problem in smaller pieces (typically I would generate a function for the Jacobian matrix, and others for the right and left hand-sides of the linear system to avoid solving it (that is, adding a constraint like Ax=b rather than x=A\b). It does speed up the building of the JuMP model but Ipopt convergence is less good in general (perhaps because there are more unknowns in the resulting NLP).

For my problem, I was also wondering if there is a way to form a Jacobian expression from an existing expression in JuMP?
(I’ve seen this post: JuMP: use AD within a JuMP model to define a derivative-based constraint but I’m not sure if this could be relevant in my case).

Thanks for the help.

Just as a heads up, we’re unlikely to be able to provide much advice without one. As is often the case, “it depends” on the specifics of your model.

Note that the reproducible example doesn’t have to be your exact large-scale problem. Try to come up with a small example that has the same structure as your real problem, so that hopefully, you can apply the suggested improvements to the small example to your larger problem.

I don’t really understand what you mean by and solve a linear system to automatically build the f term. Is the linear system symbolic?

that is, adding a constraint like Ax=b

Yes, this would be my suggested approach.

I was also wondering if there is a way to form a Jacobian expression from an existing expression in JuMP?

Nope

Is there a reason why you need symbolic computations? These can end up being pretty costly due to expression swell, especially in large dimensions

I managed to extract a MWE, which should be readable enough.

import Symbolics
using JuMP, Ipopt, LinearAlgebra
using Plots

function vech(L::AbstractMatrix{T}) where {T}
    n = LinearAlgebra.checksquare(L)
    c = zeros(T, n * (n + 1) ÷ 2)
    k = 0
    for j = 1:n, i = j:n
        k += 1
        c[k] = L[i, j]
    end
    return c # half-vectorization of L
end

function unvech(c::AbstractVector{T}) where {T}
    ns2 = length(c)
    n = Int((sqrt(1 + 8ns2) - 1) / 2)
    L = zeros(T, n, n)
    k = 0
    for j = 1:n, i = j:n
        k = k + 1
        L[i, j] = c[k]
    end
    return L # lower-triangular matrix built from vector c
end

function dynamics(x, u, t, p)
    g, σ = p # unpack parameters    
    f = [x[2], -u[1] * x[1] + g * sin(x[1])] # drift    
    G = [0, σ] # diffusion
    return f, G
end

function solve_ocp()
    # Dimensions
    nx = 2
    nu = 1
    nc = (nx * (nx + 1)) ÷ 2
    nz = nx + nc

    # General settings
    σ = 0.1
    g = 1
    T = 2.0  # Number of seconds
    N = 201  # Number of time steps
    dt = T / (N - 1)

    # Parameters tuple
    p = (g, σ)

    #------ Symbolic calculations
    Symbolics.@variables t
    Symbolics.@variables x[1:nx]
    x = Symbolics.scalarize(x)
    Symbolics.@variables c[1:nc]
    c = Symbolics.scalarize(c)
    Symbolics.@variables u[1:nu]
    u = Symbolics.scalarize(u)

    L = unvech(c)
    P = L * L'

    f, G = dynamics(x, u, t, p)

    # Computation of symbolic Jacobian matrix
    A = Symbolics.jacobian(f, x)
    dP = A * P + P * A' + G * G'
    invL = inv(L)
    H = invL * dP * invL'

    dx = f
    dc = vech(L * (LowerTriangular(H) - Diagonal(H) / 2))

    z = vcat(x, c)
    dz = vcat(dx, dc)

    F_expr = Symbolics.build_function(dz, z, u, t; expression=Val{false}, nanmath=false)[1]

    # initial state
    zini = [0, 0, 1e-3, 0, 1e-3]

    # Set Bounds
    umin = [1e-4]
    umax = [10]
    zmin = [-10, -10, 1e-4, -10, 1e-4]
    zmax = [10, 10, 10, 10, 10]

    #------ Build JuMP model
    model = Model(Ipopt.Optimizer)

    # Create JuMP variables
    @variables(model, begin
        zmin[i] ≤ z[i=1:nz, j=1:N] ≤ zmax[i]
        umin[i] ≤ u[i=1:nu, j=1:N] ≤ umax[i]
    end)

    # Create useful expressions
    @expression(model, t[j=1:N], (j - 1) * dt)
    @expression(model, P[j=1:N], unvech(1.0z[nx+1:end, j]) * unvech(1.0z[nx+1:end, j])')
    # /!\ the line below can be time-consuming with bigger models /!\
    @expression(model, F[i=1:nz, j=1:N], F_expr(z[:, j], u[:, j], t[j])[i])

    # Set constraints
    @constraint(model, f̃_const[j=1:N-1], z[:, j+1] - z[:, j] - dt * 0.5 * (F[:, j] + F[:, j+1]) .== 0)
    @constraint(model, xend_const[i=1:nx], z[i, end] == 0.0)

    # Fix initial state
    fix.(z[:, 1], zini; force=true)

    # Generate and set initial guess
    zguess = repeat(zini, 1, N)
    uguess = repeat(ones(nu), 1, N)
    set_start_value.(z, zguess)
    set_start_value.(u, uguess)

    # Objective
    R = diagm(ones(nu))
    Q = 1e4 * diagm(ones(nx))
    @expression(model, cost[j=1:N], z[1:nx, j]' * Q * z[1:nx, j] + tr(Q * P[j]) + u[:, j]' * R * u[:, j])
    @objective(model, Min, sum(dt * 0.5(cost[j] + cost[j+1]) for j = 1:N-1))

    # Optimize!
    optimize!(model)

    # Extraction of the optimal control
    uopt = Array(value.(u))

    return uopt

end

uopt = solve_ocp()

plot(uopt')

I emphasized the line taking more time with larger models…
I’m not sure whether I can avoid the symbolic Jacobian calculation (or whether I can replace it with some AD solution, but I don’t know how to do that in combination with JuMP).

Thank you for your advices.

This is the cost of re-evaluating a JuMP nonlinear expression through the symbolic Jacobian built by Symbolics. But you also evaluate it nz times when it needs only to be evaluated once.

I would write your model as something like:

    model = Model(Ipopt.Optimizer)
    @variables(model, begin
        zmin[i] <= z[i in 1:nz, j in 1:N] <= zmax[i], (start = zini[i])
        umin[i] <= u[i in 1:nu, j in 1:N] <= umax[i], (start = 1.0)
        F[1:nz, 1:N]
        cost[1:N]
    end)
    fix.(z[:, 1], zini; force = true)
    fix.(z[1:nx, end], 0.0; force = true)
    @expression(model, P[j=1:N], unvech(1.0z[nx+1:end, j]) * unvech(1.0z[nx+1:end, j])')
    @constraints(model, begin
        [j in 1:N], F[:, j] .== F_expr(z[:, j], u[:, j], t[j])
        [j in 1:N-1], z[:, j+1] - z[:, j] - dt * 0.5 * (F[:, j] + F[:, j+1]) .== 0
        [j in 1:N], cost[j] == z[1:nx, j]' * Q * z[1:nx, j] + tr(Q * P[j]) + u[:, j]' * R * u[:, j]
    end)
    @objective(model, Min, sum(dt * 0.5(cost[j] + cost[j+1]) for j in 1:N-1))
    optimize!(model)
    @assert is_solved_and_feasible(model)
    return value.(u)

It doesn’t make much of a difference on this example, but it might on larger ones.

Thanks @odow.

And if I replace:

@expression(model, F[i=1:nz, j=1:N], F_expr(z[:, j], u[:, j], t[j])[i])

with

@expression(model, F[j=1:N], F_expr(z[:, j], u[:, j], t[j]))

is it supposed to be equivalent to what you propose or not? (in terms of speed I mean). Because I wrote it like that in my larger models…

With a larger model btw, the @constraint line with your approach takes 587 seconds to execute (first run), then 0.44 seconds (second run). Using the line above with an @expression takes 574 seconds (first run). The second run is also super fast. So it’s similar apparently…

Note that I managed to get 68 seconds to build the model with another approach that uses an implicit formulation f(\dot x, x, u, t)=0 instead of an explicit formulation as above… but then I need the \dot x variable in the JuMP model and Ipopt convergence does not seem as good for some reason…

Questions:

  • can decomposing a huge expression into smaller ones that are then composed together within JuMP be faster? (instead of calling one single @expression line, I can build several intermediate JuMP expressions and then do some linear algebra in JuMP to get the same huge expression… that is, is there a chance it can be faster if I maximize the number of operations done in JuMP directly?)

  • I don’t understand why it would be better to handle the cost as you propose. Any hint? (yet this part was never an issue in my codes)

Anyway, the good point is that I do not run out of memory and that the second run is always really fast :slight_smile:

is it supposed to be equivalent to what you propose or not? (in terms of speed I mean). Because I wrote it like that in my larger models…

Yes, that’s an alternative. I prefer making F a variable, because otherwise you have lots of large repeated expressions in the constraints.

With a larger model btw, the @constraint line with your approach takes 587 seconds to execute (first run), then 0.44 seconds (second run)

This is not a JuMP problem, but the compilation overhead of executing the build_function output of Symbolics.jl.

can decomposing a huge expression into smaller ones that are then composed together within JuMP be faster?
is there a chance it can be faster if I maximize the number of operations done in JuMP directly?

Yes. You want to minimize the usage of Symbolics.

I don’t understand why it would be better to handle the cost as you propose

It probably actually doesn’t make a difference.

1 Like