Best way to define and re-use multiple JuMP nonlinear models

I have a for loop where at each iteration I am solving multiple version of a similar nonlinear problem with some parameter update.

I tried 3 ways to define the base(s) model I call at each iteration + update the parameters. I am confused by my benchmark results (and also why the timing is so different between them).

In the JuMP models, there are the variables \theta_j + nonlinear function of these variable P(n)=\sum_j \theta_j \cos(n f j) + nonlinear parameters \pi_k that will be changed at each loop. This is common to each model.
The differences are below:
Model 1: I define one model with all the S nonlinear expressions NL_exp[s] that are use later in the @NLobjective
Model 2: I define S models, where each has its own @NLobjective defined through a nonlinear expression.
Model 3: I define S models, where each has its own @NLobjective defined written directly in the objective.

Here is a Working Example of the models

using JuMP, Ipopt, StatsBase, BenchmarkTools, Random

function model_1(γ::AbstractVector, non_zero_terms::AbstractArray, d::Int, T::Int; silence = true)
    N = size(γ, 1)
    S = size(non_zero_terms, 1)

    model = Model(Ipopt.Optimizer)
    silence && set_silent(model)
    
    f = 2π/T
    trig = [cos(f*j*(n%T)) for n in 1:N, j in 0:d]

    # Unknown variables
    @variable(model, θ_jump[j = 1:d+1])

    # Polynomial of trigonometric functions
    @NLexpression(model, P[n = 1:N], sum(trig[n, j] * θ_jump[j] for j in 1:d+1))

    # Some nonlinear parameters that will be changed at each call of the model
    @NLparameter(model, πk[n = 1:N] == γ[n])

    # nonlinear expression that will be my objective
    @NLexpression(model, NL_exp[s = 1:S],
        sum(πk[n]*log1p(exp(-P[n])) for n in non_zero_terms[s,1]) + sum(πk[n]*log1p(exp(P[n])) for n in non_zero_terms[s,2])
    )

    # I register the nonlinear parameters to my model to be able to change it later
    model[:πk] = πk
    model[:NL_exp] = NL_exp # I register all the nonlinear expression and will call them later

    return model
end


function model_2(γ::AbstractVector, non_zero_terms::AbstractArray, d::Int, T::Int; silence = true)
    N = size(γ, 1)

    model = Model(Ipopt.Optimizer)
    silence && set_silent(model)
    
    f = 2π/T
    trig = [cos(f*j*(n%T)) for n in 1:N, j in 0:d]

    # Unknown variables
    @variable(model, θ_jump[j = 1:d+1])

    # Polynomial of trigonometric functions
    @NLexpression(model, P[n = 1:N], sum(trig[n, j] * θ_jump[j] for j in 1:d+1))

    # Some nonlinear parameters that will be changed at each call of the model
    @NLparameter(model, πk[n = 1:N] == γ[n])

    # nonlinear expression that will be my objective
    @NLexpression(model, NL_exp,
        sum(πk[n]*log1p(exp(-P[n])) for n in non_zero_terms[1]) + sum(πk[n]*log1p(exp(P[n])) for n in non_zero_terms[2])
    ) 

    # nonlinear objective
    @NLobjective(model, Min,
        NL_exp
    )

    # I register the nonlinear parameters to my model to be able to change it later
    model[:πk] = πk
    
    return model
end

function model_3(γ::AbstractVector, non_zero_terms::AbstractArray, d::Int, T::Int; silence = true)
    N = size(γ, 1)

    model = Model(Ipopt.Optimizer)
    silence && set_silent(model)
    
    f = 2π/T
    trig = [cos(f*j*(n%T)) for n in 1:N, j in 0:d]

    # Unknown variables
    @variable(model, θ_jump[j = 1:d+1])

    # Polynomial of trigonometric functions
    @NLexpression(model, P[n = 1:N], sum(trig[n, j] * θ_jump[j] for j in 1:d+1))

    # Some nonlinear parameters that will be changed at each call of the model
    @NLparameter(model, πk[n = 1:N] == γ[n])

    # nonlinear objective directly with the nonlinear expression
    @NLobjective(model, Min,
        sum(πk[n]*log1p(exp(-P[n])) for n in non_zero_terms[1]) + sum(πk[n]*log1p(exp(P[n])) for n in non_zero_terms[2])
    )

    # I register the nonlinear parameters to my model to be able to change it later
    model[:πk] = πk
    
    return model
end

Here are the optimize functions


function optimize_1!(θ_0::AbstractMatrix, model::Model, γ::AbstractVector, s::Int; warm_start = true)
    θ_jump = model[:θ_jump]
    πk = model[:πk]
    NL_exp = model[:NL_exp]
    [set_value(πk[n], γ[n]) for n in 1:N]
    warm_start && set_start_value.(θ_jump, θ_0[s,:])

    # Now define the objective for the NLexpression mle defined in the model_1
    @NLobjective(
    model, Min,
    NL_exp[s]
    )
    optimize!(model)
    θ_0[s,:] = value.(θ_jump)
end

function optimize_2_3!(θ_0::AbstractVector, model::Model, γ::AbstractVector; warm_start = true)
    N = size(γ, 1)
    θ_jump = model[:θ_jump]
    πk = model[:πk]

    [set_value(πk[n], γ[n]) for n in 1:N]
    warm_start && set_start_value.(θ_jump, θ_0[:])

    optimize!(model)
    θ_0[:] = value.(θ_jump)
end

Here are the Benchmark

Random.seed!(1234)
N = 1000
S = 20
d = 3
T = 100

# Some data n don't appear in the objective/NL expressions
# I distinguish between two types of data for the two terms in my objective/NLexpression
non_zero_terms_type_1 = [sample(1:N, N÷2, replace = false, ordered = true) for s in 1:S]
non_zero_terms_type_2 = [sample(setdiff(1:N, non_zero_terms_type_1[s]), N÷4, replace = false, ordered = true) for s in 1:S]
non_zero_terms = hcat(non_zero_terms_type_1, non_zero_terms_type_2)

γ = rand(N)
θ_0 = randn(S, d + 1)

# Run once before the for loops
mod_1 = model_1(γ, non_zero_terms, d, T)
mod_2 = [model_2(γ, non_zero_terms[s,:], d, T) for s in 1:S]
mod_3 = [model_3(γ, non_zero_terms[s,:], d, T) for s in 1:S]

# for 
# At each iteration of the loop the γ are updated (not shown here)

# then I run the optimize function for all s
@btime [optimize_1!($copy(θ_0), $mod_1, $γ, $s) for s in 1:S]
  1.247 s (1673006 allocations: 225.50 MiB)
@btime [optimize_2_3!($@view(copy(θ_0)[s,:]), $mod_2[s], $γ) for s in 1:S]
  3.810 s (1671943 allocations: 225.41 MiB)
@btime [optimize_2_3!($@view(copy(θ_0)[s,:]), $mod_3[s], $γ) for s in 1:S]
  2.409 s (1670443 allocations: 221.44 MiB)
# end

Can someone please help me make sense of the timing? Why such a big differences. What is the best way to go?
Model 1 looks to me like the less clean way to go, however it seems like the fastest (on some other instances it is the slowest).
Why model 2 and 3 are so different?

Moreover, when I look at the time given by Ipotp it seems that around half the time is spent inside the solver to solve the model, what is the other half? Is it possible to reduce it?
For example, with s=5

Total CPU secs in IPOPT (w/o function evaluations)   =      0.278
Total CPU secs in NLP function evaluations           =      0.050

but solve_time(mod_2[:,s]) is 0.4930000305175781

I don’t have much to say here.

Different approaches will perform differently. Model 3 might be a but slower because it doesn’t have the expression? I’d need to take a closer look.

If model 1 is the fastest, go with that. Some of the solve time includes JuMP preparing derivative information, copying the model to Ipopt, etc. We currently rebuild the model every solve, so we aren’t caching information in Ipopt.jl. See https://github.com/jump-dev/JuMP.jl/issues/1185.

Note that there’s no need to wrap statements in []. That just leads to an allocation of the vector.

[set_value(πk[n], γ[n]) for n in 1:N]

Just use

for n in 1:N
    set_value(πk[n], γ[n])
end

Different approaches will perform differently, sure, I just though that since the final @NLObjective was exactly the same for all models the run time should be the same.
I think indeed the Ipopt.jl time is similar while the ‘JuMP preparing derivative’ time varies (in a way I don’t understand).

I am surprised by the link, according to the JuMP manual When to use a nonlinear parameter just updating the nonlinear parameters is beneficial, but you are saying that it is not.
Too bad for me!

just updating the nonlinear parameters is beneficial, but you are saying that it is not.

There are two components:

  1. Building the JuMP model with derivative information
  2. Building the Ipopt.jl model from the JuMP model

Nonlinear parameters fix 1 but not 2. (If you modify a parameter, we don’t need to reconstruct the derivatives in the JuMP model, but we do need to build a new Ipopt.jl model.) The open issue is asking for 2.