Does a very large number of observed variables (3207 vs 1188 equations) significantly increase ODEProblem memory?

Hi SciML team and community,

I’m developing PowerEMT.jl, an open-source Julia package for Electromagnetic Transient (EMT) simulation of large power systems using ModelingToolkit.jl, specifically for IBR-dominated power system. The package relies heavily on hierarchical component-based modeling with many nested @named subsystems (inverters, generators, transmission lines, controllers, etc.).

After applying structural_simplify (and attempting mtkcompile), one of my test systems reports the following structure:

text

Equations (1188):
  1188 standard: see equations(sys)
Unknowns (1188): see unknowns(sys)
  ⋮ (e.g. vg25₊iq(t), vg25₊id(t), vg25₊ucq(t), ...)

Parameters (2160): see parameters(sys)
  ⋮

Observed (3207): see observed(sys)

The resulting ODEProblem is extremely large — approximately 437 MiB.

My main question is:

Does such a high number of observed variables (here ~2.7× the number of equations) significantly contribute to the memory footprint of the generated ODEProblem? Specifically, does it inflate the size of the compiled RHS function and the ObservedFunctionCache?

In power system EMT models, most of these observed variables are internal measurements automatically generated by sub-components (voltages, currents, powers, etc.). For typical use cases, we only need a small subset of them for post-processing and analysis.

  • Is this memory overhead expected for hierarchical models of this scale?
  • Are there recommended ways to minimize or eliminate the compilation of unnecessary observed functions while keeping the core 1188 differential/algebraic equations intact?
  • Would moving non-essential observed equations out of the system (and computing them manually after solve()) be an effective strategy for a library like powerEMT?

Any insights, best practices, or suggestions for reducing this overhead before the first public release of powerEMT would be greatly appreciated!

Thanks in advance!

2 Likes

Nope, the observed equations are only evaluated on demand, so the solver memory is w.r.t. the unknowns and independent of the observed.

Are you doing jac=true? With sparse=true? Usually if memory is big its from dense Jacobians.

Thank you Chris (and Aayush too) for the quick and clear reply!

Just to confirm the setup I’m using:

julia

@mtkcompile sys = System(eqs, t, [], []; systems = systems)
prob = ODEProblem(sys, [], (0.0, 5.0); jac = true, sparse = true)
sol = solve(prob, 
            reltol = 1e-5, abstol = 1e-5, 
            TRBDF2(linsolve = KLUFactorization()), 
            maxiters = 100, dt = 1e-3)  # power simulation

Yes, jac=true + sparse=true + KLUFactorization as you suggested. The second and all subsequent solves are amazingly fast.

The bottleneck is purely the creation of the ODEProblem itself (it takes a long time and the resulting prob is huge — ~437 MiB as I guessed). Aayush mentioned that a lot of the compile time is spent generating code for the observed variables, which makes sense because in our model the number of observed variables is currently ~2.7× the number of states/equations.

I’m going to refactor the model to reduce the number of observed variables and see whether that shrinks both the problem size and the compilation time. I’ll report back once I have numbers.

Thanks again for the insight — really appreciate the help!

2 Likes