# Turing & DifferentialEquations - memory allocation

I use Turing for MCMC on a simple DAE, and the computation time is in the order of 4.6 hours with apparently extreme memory allocation.

My dynamic model has 3 differential equations and 3 linear algebraic equations – with real experimental data (i.e., I don’t know how good the model is – it seems to be decent). I measure 2 of the 3 differential variables, and 2 of the 3 algebraic variables.

By eliminating the 3 algebraic equations and implementing the model using `DifferentialEquations`, the solution is found approximately twice as fast as when implementing the original DAE using `ModelingToolkit` – possibly because MTK pushes 2 of the AEs to `observed` but keeps one AE – and hence uses a solver with mass matrix.

I then construct a Turing model, and attempt to solve the MCMC problem. I use `1_000` burn-in iterations, followed by `2_000` iterations from the posterior distribution, and the results appear to be ok.

HOWEVER, it takes some 4.6 hours to find the solution with 3 chains on 3 threads. Using the `@time` macro:

the results are

So – 193 G allocations: 12 TiB, 63% gc time…

If I run the identical code in one cell in a Jupyter Notebook of VSCode, the allocation is similar, but seems to take twice the time (I restarted the computer between these runs).

Question: any thoughts on why the allocation is so horribly big?

• I have checked the `eltype` of simulation results (`DifferentialEquations`), and they are `Float64`.
• I have checked the `eltype` of what is computed in the `Turing` model, and these seem to be either:
`Float64`, or
`ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 7}`

The Turing model looks as follows:

``````@model function fit_thermal(y_data, prob, t_data)
# Measurement variance priors
V_Ts_d  ~ d_V_Ts_d
V_TFe_d ~ d_V_TFe_d
V_Tac_d ~ d_V_Tac_d
V_Tah_d ~ d_V_Tah_d
#
# Initial differential variable priors
Tr0 ~ d_Tr
Ts0 ~ d_Ts
TFe0 ~ d_TFe
#
# Heat transfer priors
UAr2d ~ d_UAr2d
UAs2Fe ~ d_UAs2Fe
UAFe2a ~ d_UAFe2a
UAx ~ d_UAx
#
# Heat source priors
QdFes ~ d_QdFes
Qdfs ~ d_Qdfs

u0_rand = [Tr0, Ts0, TFe0]
p_rand = [UAr2d, UAs2Fe, UAFe2a, UAx, QdFes, Qdfs]
#
prob_turing = remake(prob, u0=u0_rand, p=p_rand)
sol = solve(prob_turing)

# Output data model
Nd = length(t_data)
for i = 1:Nd
Ts = sol(t_data[i])
TFe = sol(t_data[i])
Tah, Tac = thermal_genODE_outputs([sol(t_data[i])], p, t_data[i])
y_data[:, i] ~ MvNormal(vec([Ts TFe Tac Tah]), [V_Ts_d, V_TFe_d, V_Tac_d, V_Tah_d])
end
end
``````

Here, the distributions of the priors have been defined outside of the function, i.e., in the global scope. If I declare them in the Turing model (as done in Turing examples), they are declared in the local scope (advantage?), but have to be re-created every time the Turing model is called.

Question: Is there a syntax for re-using the memory location for some of the quantities within the Turing model?

• I could pass work arrays, etc. as input arguments in the function, and modify these. At least for arrays of fixed size
• I don’t know what the Turing macro does “behind the scene”, so it is not clear to me how I can re-use memory without allocating new memory each time the Turing model function is called.

Did you use the allocation profiler? Share the flamegraph.

1 Like

I did not use the allocation profiler – but will try to do so. Last time I used it, was in the Juno era. I’ll share the flamegraph when I have figured out how to use it.

• Is it valuable to run a limited set of burn-in iteration and posterior iterations, i.e., will that give a representative flamegraph, or should I run the entire thing – which takes “for ever” [sorry… I’m impatient ].

Would the following way of generating the profile be ok?

``````using Plots, Profile, FlameGraphs

Profile.clear()

@profile sample(model, NUTS(Nburn, 0.65), MCMCThreads(), Npos, Nc)

g = flamegraph(C=true)

img = flamepixels(g)
``````

?

Hm…

I don’t get much out of it.

• Is there a way to save it in a format that can be simply uploaded to the discourse?