Nested submodels slow down Turing ~7x (Mooncake AD)

Probably the first thing I’d investigate is benchmarking the other AD backends just to see if the slowdown is a Turing problem or an AD problem or both. It’s been a while since I did anything Turing but the easiest way is with API · DynamicPPL

using DynamicPPL, Distributions, ADTypes, (ALL_YOUR_AD_BACKENDS...)
using DynamicPPL.TestUtils.AD: run_ad

@model function eightsch(...) ... end
model = eightsch(...)

for adtypes in (AutoForwardDiff(), ...)
    run_ad(model, adtype; test=false, benchmark=true)
end

In general my expectation would be that there should be some performance losses from a submodel but 7x is a bit excessive, especially if your model is already nontrivial (so the added cost of using a submodel should become less significant compared to actually doing the work).