I use Turing for MCMC on a simple DAE, and the computation time is in the order of 4.6 hours with apparently extreme memory allocation.
My dynamic model has 3 differential equations and 3 linear algebraic equations – with real experimental data (i.e., I don’t know how good the model is – it seems to be decent). I measure 2 of the 3 differential variables, and 2 of the 3 algebraic variables.
By eliminating the 3 algebraic equations and implementing the model using DifferentialEquations
, the solution is found approximately twice as fast as when implementing the original DAE using ModelingToolkit
– possibly because MTK pushes 2 of the AEs to observed
but keeps one AE – and hence uses a solver with mass matrix.
I then construct a Turing model, and attempt to solve the MCMC problem. I use 1_000
burn-in iterations, followed by 2_000
iterations from the posterior distribution, and the results appear to be ok.
HOWEVER, it takes some 4.6 hours to find the solution with 3 chains on 3 threads. Using the @time
macro:
the results are
So – 193 G allocations: 12 TiB, 63% gc time…
If I run the identical code in one cell in a Jupyter Notebook of VSCode, the allocation is similar, but seems to take twice the time (I restarted the computer between these runs).
Question: any thoughts on why the allocation is so horribly big?
- I have checked the
eltype
of simulation results (DifferentialEquations
), and they areFloat64
. - I have checked the
eltype
of what is computed in theTuring
model, and these seem to be either:
–Float64
, or
–ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 7}
The Turing model looks as follows:
@model function fit_thermal(y_data, prob, t_data)
# Measurement variance priors
V_Ts_d ~ d_V_Ts_d
V_TFe_d ~ d_V_TFe_d
V_Tac_d ~ d_V_Tac_d
V_Tah_d ~ d_V_Tah_d
#
# Initial differential variable priors
Tr0 ~ d_Tr
Ts0 ~ d_Ts
TFe0 ~ d_TFe
#
# Heat transfer priors
UAr2d ~ d_UAr2d
UAs2Fe ~ d_UAs2Fe
UAFe2a ~ d_UAFe2a
UAx ~ d_UAx
#
# Heat source priors
QdFes ~ d_QdFes
Qdfs ~ d_Qdfs
u0_rand = [Tr0, Ts0, TFe0]
p_rand = [UAr2d, UAs2Fe, UAFe2a, UAx, QdFes, Qdfs]
#
prob_turing = remake(prob, u0=u0_rand, p=p_rand)
sol = solve(prob_turing)
# Output data model
Nd = length(t_data)
for i = 1:Nd
Ts = sol(t_data[i])[2]
TFe = sol(t_data[i])[3]
Tah, Tac = thermal_genODE_outputs([sol(t_data[i])], p, t_data[i])
y_data[:, i] ~ MvNormal(vec([Ts TFe Tac Tah]), [V_Ts_d, V_TFe_d, V_Tac_d, V_Tah_d])
end
end
Here, the distributions of the priors have been defined outside of the function, i.e., in the global scope. If I declare them in the Turing model (as done in Turing examples), they are declared in the local scope (advantage?), but have to be re-created every time the Turing model is called.
Question: Is there a syntax for re-using the memory location for some of the quantities within the Turing model?
- I could pass work arrays, etc. as input arguments in the function, and modify these. At least for arrays of fixed size
- I don’t know what the Turing macro does “behind the scene”, so it is not clear to me how I can re-use memory without allocating new memory each time the Turing model function is called.