Providing Objective, Gradient and Hessian to the Ipopt wrapper using a single function?


I want to provide analytic expressions for the objective function, the gradient and the Hessian of the problem using a single function instead of three different functions.

I am using Ipopt via the C wrapper in here.

I would like to do so in order to reuse common calculations. In the MWE I show there are no common calculations, but the goal is to have a single function that can perform the three things depending on what is needed in each iteration (some iterations do not calculate the Hessian, for example).

# HS071
# min x * y * z^2
# st  x + y + z = 2

# Start at (0.0, 0.0, 0.0)
# End at (1.0, 1.00, 0.00)

using Ipopt

n = 3                            # this global is used by some functions
x_L = -Inf*ones(n)
x_U = +Inf*ones(n)

m = 1
g_L = [2.0]
g_U = [2.0]

function eval_f(x)
    return x[1] * x[2] * (x[3]^2)

function eval_g(x, g)
    g[1] = x[1] + x[2] + x[3]

function eval_grad_f(x, grad_f)
    grad_f[1] = x[2] * (x[3]^2)
    grad_f[2] = x[1] * (x[3]^2)
    grad_f[3] = 2 * x[1] * x[2] * x[3]

function eval_jac_g(x, mode, rows, cols, values)
    if mode == :Structure
        # Constraint (row) 1
        rows[1] = 1; cols[1] = 1
        rows[2] = 1; cols[2] = 2
        rows[3] = 1; cols[3] = 3
        # Constraint (row) 1
        values[1] = 1.0  # 1,1
        values[2] = 1.0  # 1,2
        values[3] = 1.0  # 1,3

function eval_h(x, mode, rows, cols, obj_factor, lambda, values)
    if mode == :Structure
        # Symmetric matrix, fill the lower left triangle only
        idx = 1
        for row = 1:n
            for col = 1:row
                rows[idx] = row
                cols[idx] = col
                idx += 1

        # Again, only lower left triangle
        # Objective
        values[1] = 0                                   # 1,1
        values[2] = obj_factor * (x[3]^2)               # 2,1
        values[3] = 0                                   # 2,2
        values[4] = obj_factor * (2 * x[2] * x[3])      # 3,1
        values[5] = obj_factor * (2 * x[1] * x[3])      # 3,2
        values[6] = obj_factor * (2 * x[1] * x[2])      # 3,3


prob = createProblem(

# Set starting solution
prob.x = [0.0, 0.0, 0.0]

# Solve
status = solveProblem(prob)


Can the code above be modified to use only one function that spits out what the solver needs in each iteration? Also, can this function be written to avoid using globals like n?

Thank you all.

This isn’t exactly what you’re asking for, but I use Ipopt directly myself almost exclusively for my optimization problems and worked out a solution that’s at least in the spirit of what you’re asking for. What I do is create a cache of some type for repeated work, and then my function calls for the objective/gradient/Hessian look like this:

function ipopt_grad_example(x)
  if haskey(CACHE, hash(x))
    return extract_gradient_part(CACHE[hash(x)])
    add_to_cache(CACHE, x, ...)
    return ipopt_grad_example(x)

Perhaps not the most elegant, but it works pretty well. In my use case, I do second-ish order optimization in such a way that if I’m computing the gradient, I can get my Hessian approximation basically for free. So I always lump those calls and just cache the Hessian estimate in case Ipopt wants it. You can see my use case here, although that code has been aggressively dialed in to the point where it might not be the most easy casual reading (read: I don’t write good collaborative code…sorry).

If this isn’t clear and nobody else comes in with a better suggestion, do let me know. I spent a lot of time playing with this particular problem and it would be gratifying to help somebody else spend less time doing it.


Thank you. Is the cache some Global that you keep around?

If I’m creating the Ipopt problem in the global scope, then yes—although I declare it with a const so that it doesn’t hurt performance (example). But in general for things beyond the simplest examples, the code pattern that I experience more often is setting up my problem inside another function, in which case it’s even less of a concern to create a CACHE object inside the function that the objective/gradient/Hessian will check in on at each function call. If you were concerned about having a const global like that, you could always wrap an optimize(...) function that eats your objective/gradient/etc, creates a cache, declares the necessary wrappers, and then actually calls Ipopt.


Just use Ipopt via JuMP. You’re more likely to make a mistake with the derivatives if you hand-code them.

1 Like

@odow it’s true. But in some cases the model building takes too long or JuMP returns OutOfMemoryError(), which is my case.


model = direct_model(Ipopt.Optimizer())
1 Like

Thank you @cgeoga, I marked it as a solution. The cache will consume extra memory, but I think it’s the best we can do.

Is it safe to assume that Ipopt calls the objective, gradient and Hessian in this order?

Unfortunately I’m not confident about the answer to that question. Perhaps @odow knows? To be honest, I’m really not an optimizer—I have very little understanding about what Ipopt is doing under the hood. I feel you about the extra memory consumption of the cache—I should have said, but my problems are very low-dimensional. So I’m not super bothered by storing hundreds of 5 \times 5 matrices. The caching fix might not be suitable if you have a very high dimensional problem. In that case, maybe it would make sense to keep trying with JuMP like @odow suggests—I don’t know how much it does for nonlinear problems, but presumably it’s pretty good at being careful about extra memory use and stuff.

1 Like


But in some cases the model building takes too long or JuMP returns

How are you building the problem? There might be some simple optimizations that make it better.

@odow, an MWE of what I am trying to do is shown below. Still didn’t try to use the direct_model, sorry.

The code below works for, say, M = 50, but for M = 500 it runs out of memory. My actual application would use something like M = 7500.

Also, not related to memory, but the derivative checker points a bunch of errors compared to Ipopt numerical derivatives.

using Random, Distributions, LinearAlgebra
using JuMP, Ipopt, Tullio, Plots
using StaticArrays


function pack(θ,γmat)
    γvec = γmat[:]
    Θ = vcat(θ,γvec)
    return Θ

function unpack(Θ,R,H,J)
    θ = Θ[1:R]
    γmat = reshape(Θ[R+1:end],H,J)
    return θ,γmat

function gridβ(R,β,σ)
    minβ = β + σ * quantile.(Normal(), 0.01)
    maxβ = β + σ * quantile.(Normal(), 0.99)
    gridβ = LinRange(minβ,maxβ,R)
    gridβ = collect(gridβ)
    return gridβ


# increase this variable
const M = 50                  # markets

# setup
const T = 12                  # time periods
const H = 3                   # demographic characteristics
const J = 4                   # inside goods
const K = 1                   # product characteristics
const N = M*T                 # observations
const R = 30                  # grid points

# taste distribution parameters
β₀ = rand(K)            # averages
σ₀ = rand(K)            # standard errors (positive)

# demographics parameters
γ₀ = randn(H,J);        # stores [γ₁',...,γⱼ']

# product and demographics characteristics
const x = [ @SMatrix rand(K,J)                 for i=1:N]      # x[i][j,k]
const w = [ SVector{H}( vcat(1.0,rand(H-1)) )  for i=1:N]      # w[i][h]

# true grid
𝔅₀ = gridβ(R,β₀[1],σ₀[1])'[:,:]    # K × R
θ₀ = pdf.(Normal(β₀[1],σ₀[1]), 𝔅₀)'[:]
θ₀ /= sum(θ₀)

# purchases
const xβ₀ = [ SMatrix{J,R}(x[i]'*𝔅₀) for i=1:N]
sh = [ exp.(γ₀'*w[i] .+ x[i]'*𝔅₀)       for i=1:N]
sh = [ vcat(sh[i] ./ (1 .+ sum(sh[i],dims=1)), 1 ./ (1 .+ sum(sh[i],dims=1)) )    for i=1:N]
sh = [ sum(θ₀[r]*sh[i][:,r] for r=1:R)  for i=1:N]
const y = [ SVector{J+1}(round.(rand(300:600) .* sh[i]))   for i=1:N]
L  = [ sum(y[i])                        for i=1:N]

# random parameter grid
𝔅 = gridβ(R,β₀[1],σ₀[1])'[:,:];    # K × R

# pre-calculates xβʳ
xβ = [ SMatrix{J,R}(x[i]'*𝔅) for i=1:N]      # xβ[i][j,r]


# startθ = pdf.(TriangularDist(minimum(𝔅),maximum(𝔅)), 𝔅)'[:]
startθ = pdf.(Normal(β₀[1],0.5), 𝔅₀)'[:]
startθ ./= sum(startθ)
startγ = γ₀ + 0.2*randn(H,J)
startΘ = pack(startθ,startγ)


m = Model(optimizer_with_attributes(
    "max_iter"              => 2000,
    "mu_strategy"           => "adaptive",
    "derivative_test"       => "second-order",
    "tol"                   => 1e-6,
    "hessian_approximation" => "exact"

@variable(m, 0.0 <= θ[r=1:R] <= 1.0, start = startθ[r])
@variable(m, -10.0 <= γ[h=1:H,j=1:J] <= 10.0, start = startγ[h,j])
@constraint(m, sum(θ) == 1.0)

@NLobjective(m, Max,


            y[i][j] * log(
                    θ[r] * (exp( xβ[i][j,r] + sum(w[i][h]*γ[h,j] for h=1:H) ) / (1.0 + sum( exp( xβ[i][j,r] + sum(w[i][h]*γ[h,j] for h=1:H) ) for j=1:J)))
                    for r=1:R

            for j=1:J


        y[i][J+1] * log(
                θ[r] * (1.0 / (1.0 + sum( exp( xβ[i][j,r] + sum(w[i][h]*γ[h,j] for h=1:H) ) for j=1:J)))
                for r=1:R

        for i=1:N

@time @show JuMP.optimize!(m)
mleθ = JuMP.value.(θ)
mleγ = JuMP.value.(γ)

# plotting results
plot(𝔅₀',θ₀, label="true distribution",xlabel="grid",ylabel="prob",title="JuMP + IPOPT")
plot!(𝔅₀',startθ, label="starting points")
heatmap(abs.(mleγ .- γ₀))

First, read: Performance Tips · The Julia Language

Just wrap everything in a function. You have a lot of non constant globals. Use @code_warntype on your function and look at the lines in red at the top. They are variables that aren’t type stable. Fix them. One example is θ₀ /= sum(θ₀), which you should probably write θ₀ ./= sum(θ₀) (in-place modification).

A good structure is something like

function build_data(H, J, K, N, R)
    # ... stuff
    return R, H, J, N, startθ, startγ, w, xβ, γ

function solve_model(R, H, J, N, startθ, startγ, w, xβ, γ)
    # ... stuff
    return m

data = build_data(H, J, K, N, R)

Try extracting similar expressions (I may have made a typo here):

        exp(xβ[i][j,r] + sum(w[i][h]*γ[h,j] for h=1:H))
        θ[r] / (1.0 + sum(nlexpr[i,j,r] for j=1:J))
    @NLobjective(m, Max,
                y[i][j] * log(sum(nlexpr2[i,r] * nlexpr[i,j,r] for r=1:R))
                for j=1:J
            ) +
            y[i][J+1] * log(sum(nlexpr2[i,r] for r=1:R))
            for i=1:N
        ) / N

The derivative errors are probably coming from the accuracy of the data and how dense your problem is

Number of nonzeros in equality constraint Jacobian...:       30
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:      903

Total number of variables............................:       42
                     variables with only lower bounds:        0
                variables with lower and upper bounds:       42
                     variables with only upper bounds:        0
Total number of equality constraints.................:        1
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

Obligatory xkcd: Coordinate Precision.

1 Like

Will do that, thank you @odow. The comic was funny :slight_smile: