Using JuMP for variational data assimilation -- scalability and memory usage

Hi! I have been using JuMP over the last few months to perform variational data assimilation. Specifically, I am trying to estimate ODE parameters by direct transcription of the equations into a nonlinear program, utilizing the regularizing control introduced by Abarbanel et al here, and to be solved using IPOPT.

I have code to accomplish this in JuMP, which works reasonably well. However, when comparing against some example code found online here, I found that the JuMP implementation uses significantly more memory for the same problem size (e.g. nearly 16GB RAM for a problem using only 2-3GB when run wih the example code). Moreover, when running the same JuMP code on a computing cluster, even more memory seems to be needed (e.g. code that runs fine on my laptop with 16GB RAM gives an out of memory exception when running on the cluster with the same memory budget, and only succeeds when allocating ~30GB). Direct inspection of the Hessian suggests that JuMP has determined the correct sparsity pattern, so I am not entirely sure where the disparity originates. I also find that the JuMP implementation frequently throws “Warning: error in step computation” while this rarely if ever happens with the example implementation.

Does anyone have an idea as to where the disparity might arise? To my knowledge, the example implementation computes derivatives symbolically while JuMP uses autodiff, but would the difference be that large (especially with expression swell for symbolic derivatives)? Any help would be greatly appreciated! (and I’m happy to share my code if needed!)

1 Like

Can you provide a reproducible example of the code? (Or at the very least, the log of an Ipopt solve?) It’s hard to offer generic advice because these things tend to be quite problem-specific.

I didn’t look in detail at the example code, but it looks like it is compiling some custom C++ to run Ipopt? That probably explains the difference in memory.

It also looks like their docker image uses HSL. What linear solver are you using? (If unsure, could you post the output of the Ipopt log?)

Instead of using Ipopt.jl, you could try AmplNLWriter:

using JuMP, AmplNLWriter, Ipopt_jll
model = Model(() -> AmplNLWriter.Optimizer(Ipopt_jll.amplexe))

That uses a different automatic differentiation engine, which may improve performance.

@odow, thanks for your reply! I have attached a reproducible example of the code. I tried to upload my Ipopt log and options as well, but these are evidently disallowed file formats for the Discourse forum; please let me know if I should attempt to work around this (e.g. by saving them with a .jl extension).

hh_assimilation_test_slow.jl (5.7 KB)
varassim_jump.jl (5.4 KB)

The parameter estimation that I set up in the code is for a relatively small (4 dimensional) model with a short assimilation window (300 ms of observed data) and uses about ~10GB RAM when running on my laptop. In contrast, the example code works with a much larger (12 dimensional) model, a wider assimilation window (1500 ms of observed data) and still uses only about ~5GB RAM throughout the optimization. When trying to scale up to a larger model and assimilation window using the JuMP implementation, I hit memory limitations.

I am using the HSL ma97 solver for these experiments. Should I expect there to be a way to achieve the same kind of memory performance with JuMP, or is the custom C++ really the way to go here? Ideally, I’d like to work within JuMP/Julia as in this setting it is simpler to immediately inspect the solutions found by Ipopt after the fact, and the code describing the model is more readable.

Thanks again!

I tried to upload my Ipopt log

Just copy-paste into the text box? You don’t need to include all the iterations. The most important part is the header and the footer.

Here’s the copy-pasted log:

This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit

This is Ipopt version 3.14.4, running with linear solver ma97.

Number of nonzeros in equality constraint Jacobian...:   319130
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:  1595652

Total number of variables............................:    32924
                     variables with only lower bounds:        0
                variables with lower and upper bounds:    32924
                     variables with only upper bounds:        0
Total number of equality constraints.................:    26320
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  3.4778537e+03 2.99e+02 1.20e-02   0.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  3.4761913e+03 2.83e+02 1.02e+00  -4.5 3.95e-01    -  5.22e-02 5.51e-02h  1


 233  9.8445332e-03 3.19e-09 2.87e-10 -11.0 1.05e-03    -  1.00e+00 1.00e+00h  1
 234  9.8442897e-03 8.00e-10 3.74e-11 -11.0 3.29e-04    -  1.00e+00 1.00e+00h  1

Number of Iterations....: 234

                                   (scaled)                 (unscaled)
Objective...............:   9.8442897119442197e-03    9.8442897119442197e-03
Dual infeasibility......:   3.7436955983792730e-11    3.7436955983792730e-11
Constraint violation....:   3.9841536605300383e-11    7.9979400879892637e-10
Variable bound violation:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   3.6796092331437666e-11    3.6796092331437666e-11
Overall NLP error.......:   3.9841536605300383e-11    7.9979400879892637e-10

Number of objective function evaluations             = 241
Number of objective gradient evaluations             = 235
Number of equality constraint evaluations            = 241
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 235
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 234
Total seconds in IPOPT                               = 164.437

EXIT: Optimal Solution Found.
* Solver : Ipopt

* Status
  Result count       : 1
  Termination status : LOCALLY_SOLVED
  Message from the solver:

* Candidate solution (result #1)
  Primal status      : FEASIBLE_POINT
  Dual status        : FEASIBLE_POINT
  Objective value    : 9.84429e-03
  Dual objective value : -2.54268e-02

* Work counters
  Solve time (sec)   : 1.68743e+02

Here’s the code in a single file.

It might take me a while to actually understand what’s going on, but there are a few red flags that stand out. You should essentially never need to use eval or invoke latest, and the mix of DifferentialEquations and Symbolics with JuMP can cause issues. This code is also very far from what the C++ is doing, so it’s no wonder that there is a performance difference.

How long does their Ipopt code take to run? And is the RAM issue here because of the solve or during the setup? Asked a different way, if you remove optimize!, is there still an issue?

using DifferentialEquations
using Symbolics
using Plots
using Parameters
using JuMP
using Ipopt
using StaticArrays
using Interpolations
using IOCapture
using JLD2

struct VarAssimMeta

function VarAssimMeta(

    L = length(observed_dims)
    I = length(driven_dims)
    Symbolics.@variables u[1:D], p[1:P], f[1:I]
    vf_syms = vf(u, p, f)
    vf_strs = repr.(vf_syms)
    VarAssimMeta(vf, vf_strs, D, P, observed_dims, driven_dims, L, I, lb_states, ub_states, lb_params, ub_params)

function jump_nl_substitute!(model, eqstr, prec, urec, frec)
    npts = size(frec)[2]
    @assert size(urec)[2] == npts
    find_pats = [Regex("$x" * raw"\[(\d+)\]") for x in ["p", "u", "f"]]
    repl_pats = [SubstitutionString("opt_p[\\1]"), SubstitutionString("opt_u[\\1,j]"), SubstitutionString("drive[\\1,j]")]
    neweqstr = replace(eqstr, [find_pats[i] => repl_pats[i] for i in eachindex(find_pats)]...)
    model_lambda_str = """(model, opt_p, opt_u, drive) -> @NLexpression(
    model_lambda = eval(Meta.parse(model_lambda_str))
    reshape(Base.invokelatest(model_lambda, model, prec, urec, frec), (1, npts))

abstract type AbstractVADiscretization end

struct SimpsonHermite <: AbstractVADiscretization end

function gen_dspe_jump_model(vam::VarAssimMeta, disc::SimpsonHermite, timepoints, data, drive_frozen)
    N = length(timepoints) - 1
    dt = diff(timepoints)
    @unpack vf_strs, D, P, observed_dims, lb_states, ub_states, lb_params, ub_params = vam
    L = length(observed_dims)
    model = Model()
    JuMP.@variables(model, begin
        opt_x[1:D, 1:(N+1)]
        opt_xm[1:D, 1:N]
        0.0 <= opt_k[1:(2*N+1)] <= 1.0
    drive = vcat(drive_frozen, opt_k')
    data_x = data[:, 1:2:(2*N+1)]
    data_xm = data[:, 2:2:(2*N)]
    drive_x = drive[:, 1:2:(2*N+1)]
    drive_xm = drive[:, 2:2:(2*N)]
    println("Generating timepoint and midpoint vector field subexpressions")
    F = vcat([jump_nl_substitute!(model, eqstr, opt_p, opt_x, drive_x) for eqstr in vf_strs]...)
    Fm = vcat([jump_nl_substitute!(model, eqstr, opt_p, opt_xm, drive_xm) for eqstr in vf_strs]...)
    println("Generating Simpson residual subexpressions.")
    S = @NLexpression(
        [i = 1:D, j = 1:N],
        (opt_x[i, j] + (1 / 6) * dt[j] * F[i, j]) + (2 / 3) * dt[j] * Fm[i, j] + (-opt_x[i, j+1] + (1 / 6) * dt[j] * F[i, j+1])
    println("Generating Hermite residual subexpressions.")
    H = @NLexpression(
        [i = 1:D, j = 1:N],
        ((1 / 2) * opt_x[i, j] + (dt[j] / 8) * F[i, j]) + ((1 / 2) * opt_x[i, j+1] - (dt[j] / 8) * F[i, j+1]) - opt_xm[i, j],
    @NLconstraint(model, [i = 1:D, j = 1:N], S[i, j] == 0.0)
    @NLconstraint(model, [i = 1:D, j = 1:N], H[i, j] == 0.0)
    for j = 1:D
        set_lower_bound.(opt_x[j, :], lb_states[j])
        set_lower_bound.(opt_xm[j, :], lb_states[j])

        set_upper_bound.(opt_x[j, :], ub_states[j])
        set_upper_bound.(opt_xm[j, :], ub_states[j])
    for i = 1:P
        set_lower_bound(opt_p[i], lb_params[i])
        set_upper_bound(opt_p[i], ub_params[i])
            sum((opt_x[i, j] - data_x[i, j])^2 for i in observed_dims, j = 1:(N+1)) +
            sum((opt_xm[i, j] - data_xm[i, j])^2 for i in observed_dims, j = 1:N) +
            sum(opt_k[j]^2 for j = 1:(2*N+1))
        ) / (2 * N + 1)
    println("Model generation complete!")
gtp = [1.0, -54.4, -77.0, 50.0, 0.3, 20.0, 120.0, -40.0, 15.0, 0.1, 0.4, 0.0, -60.0, -15.0, 1.0, 7.0, 0.0, -55.0, 30.0, 1.0, 5.0, 0.0];
function lorenz(u, p, t)
    dx = 10.0 * (u[2] - u[1])
    dy = u[1] * (28.0 - u[3]) - u[2]
    dz = u[1] * u[2] - (8 / 3) * u[3]
    SA[dx, dy, dz]
function scale_timeseries(ts, minv, maxv)
    tsmax = maximum(ts)
    tsmin = minimum(ts)
    minv .+ (maxv - minv) * (ts .- tsmin) / (tsmax - tsmin)
lor_tmax = 80.0
lor_dt = 0.005
lor_u0 = SA[-1.31; 0.8; 19.77]
lor_tspan = (0.0, lor_tmax)
lor_prob = ODEProblem(lorenz, lor_u0, lor_tspan)
lor_sol = solve(lor_prob, RK4(), adaptive = true, dt = lor_dt);
lorfunc = linear_interpolation(scale_timeseries(lor_sol.t, 0.0, 300.0), scale_timeseries(lor_sol[1, :], -20.0, 20.0), extrapolation_bc = 0.0);
const tanhgain = 1.0;
@inline nlss(V, θ, σ) = 0.5 * (1 + tanh(tanhgain * (V - θ) / σ))
@inline nltc(V, θ, σ, τ0, τ1, τ2) = τ0 + τ1 * (1 - tanh(tanhgain * (V - θ) / σ)^2) + τ2 * (1 + tanh(tanhgain * (V - θ) / σ))
function hhvf_oop(u, p, t)
    pvec, Iapp = p
    Cm, EL, EK, ENa, gL, gK, gNa, θm, σm, τm0, τm1, τm2, θh, σh, τh0, τh1, τh2, θn, σn, τn0, τn1, τn2 = pvec
    V, m, h, n = u
    dV = (1 / Cm) * (gNa * m * m * m * h * (ENa - V) + gK * n * n * n * n * (EK - V) + gL * (EL - V) + Iapp(t))
    dm = (nlss(V, θm, σm) - m) / nltc(V, θm, σm, τm0, τm1, τm2)
    dh = (nlss(V, θh, σh) - h) / nltc(V, θh, σh, τh0, τh1, τh2)
    dn = (nlss(V, θn, σn) - n) / nltc(V, θn, σn, τn0, τn1, τn2)
    [dV, dm, dh, dn]
hh_tmax = 300.0
hh_saveat = 0.04
hh_u0 = [
hh_prob = ODEProblem(hhvf_oop, hh_u0, (0.0, hh_tmax), (gtp, lorfunc))
hh_sol = solve(hh_prob, RK4(), adaptive = true);
function hhvf_assim_controlled(u, p, f)
    Cm, Isa, EL, EK, ENa, gL, gK, gNa, θm, σm, τm0, τm1, τm2, θh, σh, τh0, τh1, τh2, θn, σn, τn0, τn1, τn2 = p
    I, data, k = f
    V, m, h, n = u
    dV = (1 / Cm) * (gNa * m * m * m * h * (ENa - V) + gK * n * n * n * n * (EK - V) + gL * (EL - V) + I / Isa) + k * (data - V)
    dm = (nlss(V, θm, σm) - m) / nltc(V, θm, σm, τm0, τm1, τm2)
    dh = (nlss(V, θh, σh) - h) / nltc(V, θh, σh, τh0, τh1, τh2)
    dn = (nlss(V, θn, σn) - n) / nltc(V, θn, σn, τn0, τn1, τn2)
    [dV, dm, dh, dn]
plower = [1.0, 0.0005, -100.0, -100.0, 0.0, 0.0, 0.0, 0.0, -100.0, 0.01, 0.01, 0.01, 0.0, -100.0, -100.0, 0.01, 0.01, 0.0, -100.0, 0.01, 0.01, 0.01, 0.0]
pupper = [1.0, 1.0, 0.0, 0.0, 100.0, 100.0, 1000.0, 1500.0, -0.01, 100.0, 10.0, 10.0, 0.0, -0.01, -0.01, 10.0, 10.0, 0.0, -0.01, 100.0, 10.0, 10.0, 0.0]
slower = [-150.0, 0.0, 0.0, 0.0]
supper = [150.0, 1.0, 1.0, 1.0];
vam = VarAssimMeta(hhvf_assim_controlled, 4, 23, [1], [1, 2, 3], slower, supper, plower, pupper)
timepoints = hh_sol.t
N = length(timepoints) - 1
midt = (hh_sol.t[1:end-1] + hh_sol.t[2:end]) / 2.0;
allt = vcat(vec(hcat(hh_sol.t[1:end-1], midt)'), hh_sol.t[end]);
middatamat = hcat(hh_sol.(midt)...);
alldatamat = hcat(hh_sol.(allt)...);
obsdatamat = alldatamat[1:1, :];
drive = lorfunc.(allt)';
noise = 0.1 * randn(2N + 1);
data = obsdatamat + noise';
drive_frozen = vcat(drive, data);
model = gen_dspe_jump_model(vam, SimpsonHermite(), timepoints, data, drive_frozen)
set_optimizer(model, Ipopt.Optimizer)

Thanks for taking a look at the code! For what it’s worth, I think in this code DifferentialEquations, Symbolics, and JuMP are not interacting too strongly. The pipeline here is, roughly:

  • given a vector field function with signature vf(u,p,f) where u is the ODE state, p are time-independent parameters, and f are time-dependent parameters (e.g. a driving force for the ODE), use Symbolics to trace the function and obtain a string describing each component function of the vector field (in terms of indexing into arrays u, p, and f)
  • pass these equation strings to jump_nl_substitute! along with prec, urec, and frec which are Vector{VariableRef} registered for the JuMP model model. This function then generates JuMP nlexpressions which amount to substituting the variables in the rec vectors into the strings produced by step 1. I was not able to find a simpler way to perform this step, so if you have any suggestions for making this cleaner, they’d be much appreciated!
  • use the nlexpressions from the previous step to build residuals for Simpson-Hermite discretization of the ODE solution, which give the nonlinear constraints for the JuMP model. This work is done inside the function gen_dspe_jump_model.
  • finally, use DifferentialEquations to integrate the ODE based on ground-truth parameter values in vector gtp and obtain “observed data.” Add some noise to this data and pass it into the model constructor given by function gen_dspe_jump_model. We now have the JuMP model ready to be optimized.
  • if all goes well with the optimization, the solution will give estimated parameters close to the ground-truth values in gtp.

To my understanding, the C++ code is solving the same optimization problem, but the constraint Jacobian/Hessian derivatives are supplied by symbolic differentiation of the constraint equations (as opposed to the sparse reverse autodiff built into JuMP). For the problem size I am ultimately interested in (approx ~12 dim vector field with 1000 ms assimilation window), the example C++ Ipopt code takes about 3 hours to run in total. For comparison, the code I attached above runs a sample problem with a 4 dim vector field and 300 ms assimilation window; this runs quite fast in the above example, taking only ~3 min, but I don’t think the runtime scales linearly by any means. Model generation using JuMP is quite fast and uses little memory – the RAM issue is entirely during the solve.

Ah. You can use the raw expression input: Nonlinear Modeling · JuMP. But you can’t trace functions yet.

We’re in the process of fixing that:

The other thing you could try is:

import MathOptSymbolicAD
JuMP.optimize!(model; _differentiation_backend = MathOptSymbolicAD.DefaultBackend())

See Improving nonlinear programming support in JuMP | Oscar Dowson | JuliaCon 2022 - YouTube.

But I don’t think you’ll be able to get close to the hand-rolled C++ code, since they hard-code the exact Hessian:

Thanks for your reply. A couple of questions:

Would you be able to briefly explain why hardcoding the Hessian makes such a big difference in memory usage? I’m likely not understanding how exactly JuMP’s autodiff works, but it seems to me that if JuMP is tracing the operations exactly, the derivative functions passed to Ipopt should be essentially the same as the hardcoded ones (except perhaps for some structural zeros that are undetected – and maybe this makes a huge difference?).

If I hardcode the Hessian in Julia and interface with e.g. NLPModels.jl, could I expect to achieve similar performance to the custom C++ code?

Finally, a question about MathOptSymbolicAD: the constraints in my problem are all nearly of a repeated “template” form, but each constraint has a numerical parameter which is distinct across constraints (namely, the value of the driving force at each timestep); is MathOptSymbolicAD able to detect the “template” in this situation?

This is not the case. JuMP traces the function operations exactly, but it uses an algorithm to compute the gradient and hessian values.

By default, JuMP uses reverse-mode automatic differentiation to compute the gradient of the objective and the Jacobian of the constraints, and then it uses forward-mode automatic differentiation through the reverse-mode algorithm to compute the Hessian each function (Google “forward-over-reverse”). This requires a lot of steps and a lot of additional allocations to store the intermediate values.

The implementation is in MathOptInterface.jl/src/Nonlinear/ReverseAD at master · jump-dev/MathOptInterface.jl · GitHub if you want to poke around.

As shown in Benchmarking Nonlinear Optimization with AC Optimal Power Flow | Carleton Coffrin | JuliaCon 2022 - YouTube, our implementation is reasonably fast for most use-cases, but it hits some limitations, particularly when the number of non-zeros in the Hessian is large (you have 1.6e6 non-zeros, and this will likely get much worse as you scale).

You should be able to find plenty of material online that explains automatic differentiation, but here are a couple of links:

is MathOptSymbolicAD able to detect the “template” in this situation?

Yes. And with MathOptSymbolicAD, the Hessian functions provided to Ipopt should be pretty similar to the hard-coded. The downside is that there is a potentially expensive setup phase where we compute the symbolic derivatives.

1 Like