Unpredictable Compilation Performance of MTK-Generated Functions

Problem Statement: I am reporting a critical bottleneck regarding the compilation and loading stability of functions generated by ModelingToolkit.jl. My experience shows that the complexity of the generated symbolic Jacobian can lead to an “infinite” compilation hang, where performance is no longer correlated with the system’s dimensionality.

Evidence of Performance Instability: I compared two different system configurations, and the results demonstrate that MTK’s symbolic compilation path is highly unpredictable:

  • Case 1 (Sparse, ~2000 dims): A high-dimensional system with a sparse structure. MTK handles this reasonably well; the ODEProblem construction and the first solve call (Jacobian compilation) finish within an acceptable timeframe.

  • Case 2 (Dense, ~200 dims): A much smaller system but with a dense/highly-coupled structure. Despite having 10x fewer variables, the compilation of prob.f.jac (triggered during the first solve) runs for over 30 minutes and fails to complete, consuming excessive memory.

  • The Argument: This inconsistency suggests that the symbolic expression swell generated by MTK for coupled systems creates an AST so complex that it overwhelms the LLVM backend.

  • Compilation vs. Runtime: While MTK aims for fast runtime, the “compilation wall” for dense systems makes the workflow unusable. The time saved during integration is irrelevant if the function takes 30+ minutes just to load.

  • Dimensionality is a Poor Metric: Currently, users cannot predict whether a model will compile based on its size. A smaller, coupled model is significantly harder for MTK to “load” than a larger, sparse one.

Conclusion: The performance of the symbolic compilation path is currently a “black box.” We need a more robust way to handle or detect when symbolic expansion becomes a liability. Is there a plan to make the generated function overhead more predictable, or to provide a stable fallback when the symbolic Jacobian size exceeds a reasonable complexity threshold?

This looks like LLM generated output? It’s not allowed to post that directly: Guidelines - Julia Programming Language

The model code is almost always the problem. So for someone to help, you need ideally an MWE or at least some code to understand what is going on.

Anyway, this has come up before. I think packages like GraphDynamics.jl were created to work around it. Maybe Chris can say more.

I know my comment is offtopic but I just found it ironic that the OP is posting an LLM generated output with a user name with the word “hardworking”.

This is not genetated by LLM. But the translation is done with LLM. the complte code is to long, this is what i truely encounter. x_syms = equilibrium_state_syms(model)
p_syms = full_param_syms(model)

residual = vcat(model.F_V, model.f_delta)

N_V     = Symbolics.jacobian(model.F_V, model.V)       # n × n
N_delta = Symbolics.jacobian(model.F_V, model.delta)   # n × (n-1)
Cmat    = Symbolics.jacobian(model.f_delta, model.V)   # (n-1) × n

compiled_sys = mtkcompile(model.sys)

residual_fun = build_function(residual, x_syms, p_syms; expression = Val(false))[1]
M_fun        = build_function(model.M, x_syms, p_syms; expression = Val(false))[1]
F_V_fun      = build_function(model.F_V, x_syms, p_syms; expression = Val(false))[1]
f_delta_fun  = build_function(model.f_delta, x_syms, p_syms; expression = Val(false))[1]

N_V_fun     = build_function(N_V, x_syms, p_syms; expression = Val(false))[1]
N_delta_fun = build_function(N_delta, x_syms, p_syms; expression = Val(false))[1]
C_fun       = build_function(Cmat, x_syms, p_syms; expression = Val(false))[1]

, for dense model, F_V_fun , f_delta_fun takes very very long time to load

Now I tents to construct the system complete numerically. I have done a lot of comparisons, as shown below.

V0 = v_ms;

δ0 = angs_pv;

u0 = vcat(V0, δ0);

consistent du0 from explicit RHS

du0 = make_consistent_du0(u0, p, obs, cache)

residual function

dae_residual! = make_dae_residual(obs, cache)

check consistency

res0 = similar(u0)

dae_residual!(res0, du0, u0, p, 0.0)

println("initial residual norm = ", norm(res0))

all states are differential variables

differential_vars = [true for _ in 1:length(u0)]

time span

tspan = (0.0, 2.0)

DAE problem

prob = DAEProblem(

dae_residual!,

du0,

u0,

tspan,

p;

differential_vars = differential_vars,

)

Solve with Sundials IDA

sol = solve(

prob,

IDA(linear_solver = :Dense),

reltol = 1e-8,

abstol = 1e-10,

saveat = 1e-3,

)

println("retcode = ", sol.retcode)

println("final state = ", sol.u[end])

us = hcat(sol.u…);

fig = Figure()

ax = Axis(fig[1, 1])

[lines!(ax, sol.t, us[k, :]) for k in 1:n]

fig When the model is completed by numerically, the speed is very fast. When the node model dynamics are complete identical, the numerical model is easy to write, when the node model, branch model are diverse, generating model numerically take a lot of efferts, and are not modular. I do not put the codes, because the codes are too long, and I use MTK for almost a year, I think that is a really general problem, and i want to know whether some prominent improvments are possible. Currently I try to switch to numeric model, especially for a little big model.

How can you see the problem that are generated by LLM???

I’ve been tormented by this question for the last week, your answers really annoy me.

data = load(“EMT-network/case300.jld2”)

bus = data[“bus”];

branch = data[“branch”];

gen = data[“gen”];

old_to_new = make_continuous_bus!(bus, branch, gen)

power data

include(“E:/papers/ObGFM_multi/powernet_funcs.jl”)

include(“E:/papers/ObGFM_multi/obgfm_funcs.jl”)

include(“E:/papers/ObGFM_multi/obgfm_anal.jl”)

slack_bus = findfirst(bus[:, 2] .== 3) # Slack节点

pv_buses = findall(bus[:, 2] .== 2) # PV节点

pq_buses = findall(bus[:, 2] .== 1) # PQ节点

Ybus = calc_Ybus_s(bus, branch)

θ0 = deg2rad.(bus[[slack_bus; pv_buses], 9]);

V0 = bus[[slack_bus; pv_buses], 8];

V_re = V0 .* exp.(1im .* (θ0 .+ pi/2));

initial conditions

Y_red, Y_s, elims = kron_reduce(Ybus, [slack_bus; pv_buses]);

V_eli = Y_s*V_re;

idx = sortperm([elims; slack_bus; pv_buses])

V_bus = [V_eli; V_re][idx];

angs = angle.(V_bus) .- angle(V_bus[slack_bus]);

angs_pv = angs[pv_buses];

V_ext = Y_redV_re . (0.005+0.01*1im) + V_re

minimum(abs.(V_ext))

maximum(abs.(V_ext))

v_ms = abs.(V_ext)

prs = real.(V_ext .* conj.(Y_red * V_re))

keys = Symbol.(“vg”, [slack_bus; pv_buses]);

gen_types = [GFM_observer_re for _ in [slack_bus; pv_buses]];

vgfs = Dict(keys .=> gen_types);

paras = [(ko = 0.05, kv = 0.1, kp = 0.01, lg = 0.01, rg = 0.005),

(ko = 0.01, kv = 0.1, kp = 0.01, lg = 0.01, rg = 0.005)];

para = (ko = 0.01, kv = 0.1, kp = 0.01, lg = 0.01, rg = 0.005);

paras = Dict(vg => para for vg in keys);

=== construct reduced Ybus matrix and parameters ===

extensions = [(idx, 1/(1im*paras[vg].lg+paras[vg].rg)) for (idx, vg) in zip([slack_bus; pv_buses], keys)]

Y_ext, node_map = extend_Ybus(Ybus, extensions)

node_gfms = [node.new for node in node_map]

Y_gfms, _ = kron_reduce(Y_ext, node_gfms);

kos = [paras[vg].ko for vg in keys];

kvs = [paras[vg].kv for vg in keys];

kps = [paras[vg].kp for vg in keys];

data = ObserverData(

ω_b = 2π * 60.0,

V0 = v_ms,

δ0 = angs_pv,

ko = kos,

kp = kps,

kv = kvs,

pr = prs,

vr = v_ms,

G = real(Y_gfms),

B = imag(Y_gfms),

);

n = length([slack_bus; pv_buses])

model = build_reduced_observer_system(n);

ode_sys = mtkcompile(model.sys)

u0 = state_map(model, data)

p = param_map(model, data)

prob = ODEProblem(ode_sys, vcat(u0, p), (0.0, 2.0), jac = true)

This looks like powerflow, but I still don’t see any model code?

Can you format your code? You can edit your original comments. You can use triple backticks like this:

/```
my code
```/

Without the /

See also: Please read: make it easier to help you

At the end of the day, it is a (longstanding) limitation of MTK/Symbolics that compilation for large numbers of symbolic expressions (even ODE RHS functions, not just Jacobians), becomes too slow to be useful as the problem size increases. There have been attempts to speed this up in the past (JuliaSimCompiler was one attempt I think), but I’m not sure where the current efforts are in addressing this issue. @ChrisRackauckas previously mentioned that one simple fix/hack might be to chunk a function into sub-functions, but I don’t know if this has ever been explored in practice.

Well, according to our experience, MTK 11 is about 50 times faster than MTK 10. So the problem described here must be a very specific test case. Without executable code difficult to investigate.

But do you generate MTK models with tens of thousands of expressions? Because that’s the bottleneck being hit here. Which I don’t think MTK v11 addresses either.

Lots of little things. But first a high level thing. JuliaSimCompiler we learned a lot of things, but ultimately had to start over. MTK v11 made the symbolic side type stable which was orthogonal to the JSC thing of building easier functions to compile. That has been revived in DyadCompiler which plugs better with the normal MTK stack so easier to maintain and can layer more tricks, but most to come in this. Just had a live meeting with the crew today in Copenhagen to kick off the new effort here so we will see how it goes.

Getting Jacobians via symbolics is almost always the wrong idea. Autodiff is simply going to be faster.

If you know a very specific structure of your problem, then directly building an optimized representation for it is always going to be a good way to get something good. GraphDynamics for example specializes on only having ODEs, causal connections, and graph connections. It directly generates to for loops with that exact structure. If your problem doesn’t fit that structure you’re out of luck, but if it does it’s a great modeling environment.

ModelingToolkit lives in the most pessimistic space of possibly any DAE relations and attempts to generate code for that. Ultimately it is nice because it’s tricks work everywhere, but you also have the problem that working very generically can be more difficult to get specific cases to full optimality without getting extra information from the user about structure. The general method has wide reach and keeps growing in what it covers, but ultimately expert knowledge will beat it until we get that encoded into the system, and there’s lots of domains to get in there so there’s plenty of room for other modeling frameworks.

Structure recovery seems better to get to first because most people seem to have at least some structure, plus specializing out linear parts.

Though I’ll say the Ordinarydiffeq v7 is taking a ton of attention right now, but hopefully is done this week…

obgfm_anal.jl (7.4 KB)
powernet_funcs.jl (2.9 KB)
simulate - SM - 1.jl (3.7 KB)

I have uploaded a script for a small, dense model (Reduced model that negecting fast dynamics), and the data.jld2 file is not allowed to upload. The code demonstrates a significant performance bottleneck: while symbolic generation for systems like IEEE-118 or IEEE-300 is acceptable, the overhead of creating an ODEProblem and executing the first solve is excessively high. Specifically, I require the Jacobian function for my analysis, but the initial call to prob.f.jac is extremely time-consuming. I initially suspected system scale was the issue, but even for small models.

The main function is simulate-SM-1. Currently, at this stage, I have to switch my model numerically.

Can you put this to the issues in the repo? We use he issue list to sort and prioritize work, so it’ll get lost here but it’ll get discussion there.

ForwardDiff.jacobian is likely a better way to handle this.

Would adding method splitting for the ODE rhs be that complicated to add? This is an issue that has come up for years, so perhaps getting some solution in, even if it is not ideal, would still be helpful? At this point I don’t know the Symbolics build_function code well enough to understand if it would be hard to see that the list of symbolic expressions / equations is over some threshold and then start splitting them up (or to at least add this as a kwarg that can be turned on). But at a high level this doesn’t sound like it would be that big a modification to implement in a crude way?

While I agree with the general sentiment, in the audio domain I would be happy with an “offline” compilation step that takes 3 days if it makes the “realtime” solve faster.

To deal with these competing demands, MTK is going to need to adopt the specialization levels that SciMLBase introduced for the latency handling, i.e. NoSpecialize vs AutoSpecialize vs FullSpecialize. In AutoSpecialize, we definitely can compile less by actually just (symbolically) interpreting a lot of expressions, that will never hit the Julia compiler but will just be runtime slower. In many cases it’s likely fast enough to not even show up in any profile in any meaningful way. One place where this is likely true is remake if you choose a subset of variables, where we have to build the symbolic change functions and compile them, but in most cases it’s likely okay or even faster to just symbolically interpret that change via using substitute. But yes, then real-time audio applications will ask for the version that compiles absolutely everything to generate a really slim artifact. So I think we just need to end up with different levels where the user makes a choice for what balance is appropriate for their application, just like we did with the ODE. And with the ODE case, AutoSpecialization ended up being good enough that 99.9% of people probably don’t even touch that part and ways just do the auto form, so it ends up just being a latency improvement without a meaningful runtime hit.

The difficult part is that the separating needs to be done not too crudely as it can get expensive itself. The easiest way to get the linear part is to take the Jacobian and grab all the constants, and then diff them out of the equation by running simplify. That simplify can be a lot of simplifies though, so that can be slow. Dhariya does have a good working code for this though that improves PDE cases like Bruss and I’m having him look at BCR next. This is getting the whole compiler team involved as it’s actually turned into quite a difficult (/interesting) problem.

MTK works exceptionally well for small-scale simulations and learning fundamental concepts, as it allows me to build models from first principles. I have also benefited a great deal from using it. My comment only reflects the practical challenges that arise specifically when compiling large-scale and complex models.