Turing & DifferentialEquations - memory allocation

I use Turing for MCMC on a simple DAE, and the computation time is in the order of 4.6 hours with apparently extreme memory allocation.

My dynamic model has 3 differential equations and 3 linear algebraic equations – with real experimental data (i.e., I don’t know how good the model is – it seems to be decent). I measure 2 of the 3 differential variables, and 2 of the 3 algebraic variables.

By eliminating the 3 algebraic equations and implementing the model using DifferentialEquations, the solution is found approximately twice as fast as when implementing the original DAE using ModelingToolkit – possibly because MTK pushes 2 of the AEs to observed but keeps one AE – and hence uses a solver with mass matrix.

I then construct a Turing model, and attempt to solve the MCMC problem. I use 1_000 burn-in iterations, followed by 2_000 iterations from the posterior distribution, and the results appear to be ok.

HOWEVER, it takes some 4.6 hours to find the solution with 3 chains on 3 threads. Using the @time macro:


the results are

So – 193 G allocations: 12 TiB, 63% gc time…

If I run the identical code in one cell in a Jupyter Notebook of VSCode, the allocation is similar, but seems to take twice the time (I restarted the computer between these runs).

Question: any thoughts on why the allocation is so horribly big?

  • I have checked the eltype of simulation results (DifferentialEquations), and they are Float64.
  • I have checked the eltype of what is computed in the Turing model, and these seem to be either:
    Float64, or
    ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 7}

The Turing model looks as follows:

@model function fit_thermal(y_data, prob, t_data)
    # Measurement variance priors
    V_Ts_d  ~ d_V_Ts_d
    V_TFe_d ~ d_V_TFe_d
    V_Tac_d ~ d_V_Tac_d
    V_Tah_d ~ d_V_Tah_d
    # 
    # Initial differential variable priors
    Tr0 ~ d_Tr
    Ts0 ~ d_Ts
    TFe0 ~ d_TFe
    # 
    # Heat transfer priors
    UAr2d ~ d_UAr2d
    UAs2Fe ~ d_UAs2Fe
    UAFe2a ~ d_UAFe2a
    UAx ~ d_UAx
    # 
    # Heat source priors
    QdFes ~ d_QdFes
    Qdfs ~ d_Qdfs

    u0_rand = [Tr0, Ts0, TFe0]
    p_rand = [UAr2d, UAs2Fe, UAFe2a, UAx, QdFes, Qdfs]
    # 
    prob_turing = remake(prob, u0=u0_rand, p=p_rand)
    sol = solve(prob_turing)

    # Output data model
    Nd = length(t_data)
    for i = 1:Nd
        Ts = sol(t_data[i])[2]
        TFe = sol(t_data[i])[3]
        Tah, Tac = thermal_genODE_outputs([sol(t_data[i])], p, t_data[i])
        y_data[:, i] ~ MvNormal(vec([Ts TFe Tac Tah]), [V_Ts_d, V_TFe_d, V_Tac_d, V_Tah_d])
    end
end

Here, the distributions of the priors have been defined outside of the function, i.e., in the global scope. If I declare them in the Turing model (as done in Turing examples), they are declared in the local scope (advantage?), but have to be re-created every time the Turing model is called.

Question: Is there a syntax for re-using the memory location for some of the quantities within the Turing model?

  • I could pass work arrays, etc. as input arguments in the function, and modify these. At least for arrays of fixed size
  • I don’t know what the Turing macro does “behind the scene”, so it is not clear to me how I can re-use memory without allocating new memory each time the Turing model function is called.

Did you use the allocation profiler? Share the flamegraph.

1 Like

I did not use the allocation profiler – but will try to do so. Last time I used it, was in the Juno era. I’ll share the flamegraph when I have figured out how to use it.

  • Is it valuable to run a limited set of burn-in iteration and posterior iterations, i.e., will that give a representative flamegraph, or should I run the entire thing – which takes “for ever” [sorry… I’m impatient :roll_eyes:].

Would the following way of generating the profile be ok?

using Plots, Profile, FlameGraphs

Profile.clear()

@profile sample(model, NUTS(Nburn, 0.65), MCMCThreads(), Npos, Nc)

g = flamegraph(C=true)

img = flamepixels(g)

?

Hm…

I don’t get much out of it.

  • Is there a way to save it in a format that can be simply uploaded to the discourse?