Providing Objective, Gradient and Hessian to the Ipopt wrapper using a single function?

Hello,

I want to provide analytic expressions for the objective function, the gradient and the Hessian of the problem using a single function instead of three different functions.

I am using Ipopt via the C wrapper in here.

I would like to do so in order to reuse common calculations. In the MWE I show there are no common calculations, but the goal is to have a single function that can perform the three things depending on what is needed in each iteration (some iterations do not calculate the Hessian, for example).

# HS071
# min x * y * z^2
# st  x + y + z = 2

# Start at (0.0, 0.0, 0.0)
# End at (1.0, 1.00, 0.00)

using Ipopt

n = 3                            # this global is used by some functions
x_L = -Inf*ones(n)
x_U = +Inf*ones(n)

m = 1
g_L = [2.0]
g_U = [2.0]

function eval_f(x)
    return x[1] * x[2] * (x[3]^2)
end

function eval_g(x, g)
    g[1] = x[1] + x[2] + x[3]
    return
end

function eval_grad_f(x, grad_f)
    grad_f[1] = x[2] * (x[3]^2)
    grad_f[2] = x[1] * (x[3]^2)
    grad_f[3] = 2 * x[1] * x[2] * x[3]
    return
end

function eval_jac_g(x, mode, rows, cols, values)
    if mode == :Structure
        # Constraint (row) 1
        rows[1] = 1; cols[1] = 1
        rows[2] = 1; cols[2] = 2
        rows[3] = 1; cols[3] = 3
    else
        # Constraint (row) 1
        values[1] = 1.0  # 1,1
        values[2] = 1.0  # 1,2
        values[3] = 1.0  # 1,3
    end
    return
end

function eval_h(x, mode, rows, cols, obj_factor, lambda, values)
    if mode == :Structure
        # Symmetric matrix, fill the lower left triangle only
        idx = 1
        for row = 1:n
            for col = 1:row
                rows[idx] = row
                cols[idx] = col
                idx += 1
            end
        end
    else

        # Again, only lower left triangle
        # Objective
        values[1] = 0                                   # 1,1
        values[2] = obj_factor * (x[3]^2)               # 2,1
        values[3] = 0                                   # 2,2
        values[4] = obj_factor * (2 * x[2] * x[3])      # 3,1
        values[5] = obj_factor * (2 * x[1] * x[3])      # 3,2
        values[6] = obj_factor * (2 * x[1] * x[2])      # 3,3

    end
end

prob = createProblem(
    n,
    x_L,
    x_U,
    m,
    g_L,
    g_U,
    3,
    6,
    eval_f,
    eval_g,
    eval_grad_f,
    eval_jac_g,
    eval_h,
)

# Set starting solution
prob.x = [0.0, 0.0, 0.0]

# Solve
status = solveProblem(prob)

println(Ipopt.ApplicationReturnStatus[status])
println(prob.x)
println(prob.obj_val)

Can the code above be modified to use only one function that spits out what the solver needs in each iteration? Also, can this function be written to avoid using globals like n?

Thank you all.

This isnā€™t exactly what youā€™re asking for, but I use Ipopt directly myself almost exclusively for my optimization problems and worked out a solution thatā€™s at least in the spirit of what youā€™re asking for. What I do is create a cache of some type for repeated work, and then my function calls for the objective/gradient/Hessian look like this:

function ipopt_grad_example(x)
  if haskey(CACHE, hash(x))
    return extract_gradient_part(CACHE[hash(x)])
  else
    add_to_cache(CACHE, x, ...)
    return ipopt_grad_example(x)
  end
end

Perhaps not the most elegant, but it works pretty well. In my use case, I do second-ish order optimization in such a way that if Iā€™m computing the gradient, I can get my Hessian approximation basically for free. So I always lump those calls and just cache the Hessian estimate in case Ipopt wants it. You can see my use case here, although that code has been aggressively dialed in to the point where it might not be the most easy casual reading (read: I donā€™t write good collaborative codeā€¦sorry).

If this isnā€™t clear and nobody else comes in with a better suggestion, do let me know. I spent a lot of time playing with this particular problem and it would be gratifying to help somebody else spend less time doing it.

3 Likes

Thank you. Is the cache some Global that you keep around?

If Iā€™m creating the Ipopt problem in the global scope, then yesā€”although I declare it with a const so that it doesnā€™t hurt performance (example). But in general for things beyond the simplest examples, the code pattern that I experience more often is setting up my problem inside another function, in which case itā€™s even less of a concern to create a CACHE object inside the function that the objective/gradient/Hessian will check in on at each function call. If you were concerned about having a const global like that, you could always wrap an optimize(...) function that eats your objective/gradient/etc, creates a cache, declares the necessary wrappers, and then actually calls Ipopt.

2 Likes

Just use Ipopt via JuMP. Youā€™re more likely to make a mistake with the derivatives if you hand-code them.

1 Like

@odow itā€™s true. But in some cases the model building takes too long or JuMP returns OutOfMemoryError(), which is my case.

Try

model = direct_model(Ipopt.Optimizer())
1 Like

Thank you @cgeoga, I marked it as a solution. The cache will consume extra memory, but I think itā€™s the best we can do.

Is it safe to assume that Ipopt calls the objective, gradient and Hessian in this order?

Unfortunately Iā€™m not confident about the answer to that question. Perhaps @odow knows? To be honest, Iā€™m really not an optimizerā€”I have very little understanding about what Ipopt is doing under the hood. I feel you about the extra memory consumption of the cacheā€”I should have said, but my problems are very low-dimensional. So Iā€™m not super bothered by storing hundreds of 5 \times 5 matrices. The caching fix might not be suitable if you have a very high dimensional problem. In that case, maybe it would make sense to keep trying with JuMP like @odow suggestsā€”I donā€™t know how much it does for nonlinear problems, but presumably itā€™s pretty good at being careful about extra memory use and stuff.

1 Like

No.

But in some cases the model building takes too long or JuMP returns

How are you building the problem? There might be some simple optimizations that make it better.

@odow, an MWE of what I am trying to do is shown below. Still didnā€™t try to use the direct_model, sorry.

The code below works for, say, M = 50, but for M = 500 it runs out of memory. My actual application would use something like M = 7500.

Also, not related to memory, but the derivative checker points a bunch of errors compared to Ipopt numerical derivatives.

using Random, Distributions, LinearAlgebra
using JuMP, Ipopt, Tullio, Plots
using StaticArrays

## SOME FUNCTIONS

function pack(Īø,Ī³mat)
    Ī³vec = Ī³mat[:]
    Ī˜ = vcat(Īø,Ī³vec)
    return Ī˜
end

function unpack(Ī˜,R,H,J)
    Īø = Ī˜[1:R]
    Ī³mat = reshape(Ī˜[R+1:end],H,J)
    return Īø,Ī³mat
end

function gridĪ²(R,Ī²,Ļƒ)
    minĪ² = Ī² + Ļƒ * quantile.(Normal(), 0.01)
    maxĪ² = Ī² + Ļƒ * quantile.(Normal(), 0.99)
    gridĪ² = LinRange(minĪ²,maxĪ²,R)
    gridĪ² = collect(gridĪ²)
    return gridĪ²
end

## FAKE DATA

# increase this variable
const M = 50                  # markets

# setup
const T = 12                  # time periods
const H = 3                   # demographic characteristics
const J = 4                   # inside goods
const K = 1                   # product characteristics
const N = M*T                 # observations
const R = 30                  # grid points

# taste distribution parameters
Ī²ā‚€ = rand(K)            # averages
Ļƒā‚€ = rand(K)            # standard errors (positive)

# demographics parameters
Ī³ā‚€ = randn(H,J);        # stores [Ī³ā‚',...,Ī³ā±¼']

# product and demographics characteristics
const x = [ @SMatrix rand(K,J)                 for i=1:N]      # x[i][j,k]
const w = [ SVector{H}( vcat(1.0,rand(H-1)) )  for i=1:N]      # w[i][h]

# true grid
š”…ā‚€ = gridĪ²(R,Ī²ā‚€[1],Ļƒā‚€[1])'[:,:]    # K Ɨ R
Īøā‚€ = pdf.(Normal(Ī²ā‚€[1],Ļƒā‚€[1]), š”…ā‚€)'[:]
Īøā‚€ /= sum(Īøā‚€)

# purchases
const xĪ²ā‚€ = [ SMatrix{J,R}(x[i]'*š”…ā‚€) for i=1:N]
sh = [ exp.(Ī³ā‚€'*w[i] .+ x[i]'*š”…ā‚€)       for i=1:N]
sh = [ vcat(sh[i] ./ (1 .+ sum(sh[i],dims=1)), 1 ./ (1 .+ sum(sh[i],dims=1)) )    for i=1:N]
sh = [ sum(Īøā‚€[r]*sh[i][:,r] for r=1:R)  for i=1:N]
const y = [ SVector{J+1}(round.(rand(300:600) .* sh[i]))   for i=1:N]
L  = [ sum(y[i])                        for i=1:N]

# random parameter grid
š”… = gridĪ²(R,Ī²ā‚€[1],Ļƒā‚€[1])'[:,:];    # K Ɨ R

# pre-calculates xĪ²Ź³
xĪ² = [ SMatrix{J,R}(x[i]'*š”…) for i=1:N]      # xĪ²[i][j,r]

## STARTING VALUES

# startĪø = pdf.(TriangularDist(minimum(š”…),maximum(š”…)), š”…)'[:]
startĪø = pdf.(Normal(Ī²ā‚€[1],0.5), š”…ā‚€)'[:]
startĪø ./= sum(startĪø)
startĪ³ = Ī³ā‚€ + 0.2*randn(H,J)
startĪ˜ = pack(startĪø,startĪ³)

## OPTIMIZATION WITH IPOPT + JUMP

m = Model(optimizer_with_attributes(
    Ipopt.Optimizer,
    "max_iter"              => 2000,
    "mu_strategy"           => "adaptive",
    "derivative_test"       => "second-order",
    "tol"                   => 1e-6,
    "hessian_approximation" => "exact"
))

@variable(m, 0.0 <= Īø[r=1:R] <= 1.0, start = startĪø[r])
@variable(m, -10.0 <= Ī³[h=1:H,j=1:J] <= 10.0, start = startĪ³[h,j])
@constraint(m, sum(Īø) == 1.0)

@NLobjective(m, Max,
    sum(

        sum(

            y[i][j] * log(
                sum(
                    Īø[r] * (exp( xĪ²[i][j,r] + sum(w[i][h]*Ī³[h,j] for h=1:H) ) / (1.0 + sum( exp( xĪ²[i][j,r] + sum(w[i][h]*Ī³[h,j] for h=1:H) ) for j=1:J)))
                    for r=1:R
                )

            )
            for j=1:J
        )

        +

        y[i][J+1] * log(
            sum(
                Īø[r] * (1.0 / (1.0 + sum( exp( xĪ²[i][j,r] + sum(w[i][h]*Ī³[h,j] for h=1:H) ) for j=1:J)))
                for r=1:R
            )

        )
        for i=1:N
    )/N
)

@time @show JuMP.optimize!(m)
mleĪø = JuMP.value.(Īø)
mleĪ³ = JuMP.value.(Ī³)

# plotting results
plot(š”…ā‚€',Īøā‚€, label="true distribution",xlabel="grid",ylabel="prob",title="JuMP + IPOPT")
plot!(š”…ā‚€',startĪø, label="starting points")
plot!(š”…ā‚€',mleĪø,label="estimated")
heatmap(abs.(mleĪ³ .- Ī³ā‚€))

First, read: Performance Tips Ā· The Julia Language

Just wrap everything in a function. You have a lot of non constant globals. Use @code_warntype on your function and look at the lines in red at the top. They are variables that arenā€™t type stable. Fix them. One example is Īøā‚€ /= sum(Īøā‚€), which you should probably write Īøā‚€ ./= sum(Īøā‚€) (in-place modification).

A good structure is something like

function build_data(H, J, K, N, R)
    # ... stuff
    return R, H, J, N, startĪø, startĪ³, w, xĪ², Ī³
end

function solve_model(R, H, J, N, startĪø, startĪ³, w, xĪ², Ī³)
    # ... stuff
    return m
end

data = build_data(H, J, K, N, R)
solve_model(data...)

Try extracting similar expressions (I may have made a typo here):

    @NLexpression(
        m,
        nlexpr[i=1:N,j=1:J,r=1:R],
        exp(xĪ²[i][j,r] + sum(w[i][h]*Ī³[h,j] for h=1:H))
    )
    @NLexpression(
        m,
        nlexpr2[i=1:N,r=1:R],
        Īø[r] / (1.0 + sum(nlexpr[i,j,r] for j=1:J))
    )
    @NLobjective(m, Max,
        sum(
            sum(
                y[i][j] * log(sum(nlexpr2[i,r] * nlexpr[i,j,r] for r=1:R))
                for j=1:J
            ) +
            y[i][J+1] * log(sum(nlexpr2[i,r] for r=1:R))
            for i=1:N
        ) / N
    )

The derivative errors are probably coming from the accuracy of the data and how dense your problem is

Number of nonzeros in equality constraint Jacobian...:       30
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:      903

Total number of variables............................:       42
                     variables with only lower bounds:        0
                variables with lower and upper bounds:       42
                     variables with only upper bounds:        0
Total number of equality constraints.................:        1
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

Obligatory xkcd: Coordinate Precision.

1 Like

Will do that, thank you @odow. The comic was funny :slight_smile: