Turing.jl slow inference

Hi,

I have specified the following model in Turing.jl:

\begin{align} e^{\mathcal{L}C_\mathrm{i}}\dot{T}_\mathrm{i} =& e^{{\mathcal{L}A_\mathrm{w}}}\Phi_{\mathrm{s}}(t) + e^{-\mathcal{L}R_{\mathrm{ih}}} \left[T_{\mathrm{h}}(t) - T_{\mathrm{i}}(t)\right] + e^{-\mathcal{L}R_{\mathrm{ie}}} \left[T_{\mathrm{e}}(t) - T_{\mathrm{i}}(t)\right]\\ e^{\mathcal{L}C_\mathrm{e}} \dot{T}_\mathrm{e} =& e^{{\mathcal{L}A_\mathrm{e}}}\Phi_{\mathrm{s}}(t) + e^{-\mathcal{L}R_{\mathrm{ea}}} \left[T_{\mathrm{a}}(t) - T_{\mathrm{e}}(t)\right] + e^{-\mathcal{L}R_{\mathrm{ie}}} \left[T_{\mathrm{i}}(t) - T_{\mathrm{e}}(t)\right]\\ e^{\mathcal{L}C_\mathrm{h}}\dot{T}_\mathrm{h} =& \Phi_{\mathrm{cv}}(t) + \Phi_{\mathrm{hp}}(t) + e^{-\mathcal{L}R_{\mathrm{ih}}} \left[T_{\mathrm{i}}(t) - T_{\mathrm{h}}(t)\right] \end{align}

The code is:

Model

using ModelingToolkit
using ModelingToolkit: t_nounits as t, D_nounits as D
using ModelingToolkitStandardLibrary.Blocks
using DataInterpolations
using OrdinaryDiffEq

function System(input_funcs::Dict, params::Dict; name)
    @named T_a_out = TimeVaryingFunction(input_funcs[:T_a])
    @named Φ_hp_out = TimeVaryingFunction(input_funcs[:Φ_hp])
    @named Φ_cv_out = TimeVaryingFunction(input_funcs[:Φ_cv])
    @named Φ_s_out = TimeVaryingFunction(input_funcs[:Φ_s])

    vars = @variables T_i(t) T_e(t) T_h(t) T_a(t) Φ_hp(t) Φ_cv(t) Φ_s(t) 
    pars = @parameters lC_i=params[:lC_i] lC_e=params[:lC_e] lC_h=params[:lC_h] lR_ie=params[:lR_ie] lR_ea=params[:lR_ea] lR_ih=params[:lR_ih] lA_w=params[:lA_w] lA_e=params[:lA_e]

    eqs = [  
        # inputs
        T_a ~ T_a_out.output.u
        Φ_hp ~ Φ_hp_out.output.u
        Φ_cv ~ Φ_cv_out.output.u
        Φ_s ~ Φ_s_out.output.u

        # dynamics
        exp(lC_i)*D(T_i) ~ exp(lA_w)*Φ_s + exp(-lR_ih)*(T_h - T_i) + exp(-lR_ie)*(T_e - T_i)
        exp(lC_e)*D(T_e) ~ exp(lA_e)*Φ_s + exp(-lR_ea)*(T_a - T_e) + exp(-lR_ie)*(T_i - T_e)
        exp(lC_h)*D(T_h) ~ Φ_cv + Φ_hp + exp(-lR_ih)*(T_i - T_h)
    ]

    ODESystem(eqs, t, vars, pars; systems = [T_a_out, Φ_hp_out, Φ_cv_out, Φ_s_out], name)
end

# input functions
T_a_linf = LinearInterpolation(T_a_data, time)
Φ_hp_linf = LinearInterpolation(Φ_hp_data, time)
Φ_cv_linf = LinearInterpolation(Φ_cv_data, time)
Φ_s_linf = LinearInterpolation(Φ_s_data, time)

input_funcs = Dict(
    :T_a => T_a_linf,
    :Φ_hp => Φ_hp_linf,
    :Φ_cv => Φ_cv_linf,
    :Φ_s => Φ_s_linf
)

@named system = System(input_funcs, params_d)
sys = structural_simplify(system)
prob = ODEProblem(sys, [T_bias, T_bias, T_bias], (0, time[end]))
sol = solve(prob, Tsit5());

Turing spec
using Turing
using DifferentialEquations


@model function LGDS(x::AbstractArray, prob, params::Dict, t_int::Float64)

    # prior distributions
    # V ~ Wishart(params[:ν_m], params[:Λ_m])
    # Vinv ~ InverseWishart(params[:ν_m], params[:Λ_m])
    σ ~ InverseGamma(2, 3)
    lC ~ MvNormal(params[:μ_C], params[:Σ_C])
    lR ~ MvNormal(params[:μ_R], params[:Σ_R])
    lA ~ MvNormal(params[:μ_A], params[:Σ_A])

    p = [lC..., lR..., lA...]

    # simulate LGDS
    # T_ode = solve(prob, Tsit5(); p=p, saveat=t_int, save_idxs=[1,3], verbose=false)
    T_ode = solve(prob, Tsit5(); p=p, saveat=t_int, verbose=false)
    if length(T_ode) != length(x)
        return
    end

    # observations
    for i in 1:length(x)
        # x[:, i] ~ MvNormal(T_ode[i], Vinv)
        # x[:, i] ~ MvNormalMeanPrecision(T_ode[i], V)
        x[:, i] ~ MvNormalMeanPrecision(T_ode[i], σ^2 * I)
    end

    return nothing
end

# instantiate model
input_data = Array(data[!, [:T_i, :T_e, :T_h]])
model = LGDS(input_data, prob, prior_params, Δt)

# sample independent chains
n_chains = 3
chain = sample(model, NUTS(0.65), MCMCSerial(), 1000, n_chains; progress=true)

The Turing.jl code has two major problems:

  • It’s very slow. The above turing code took about 80 minutes to finish.
  • The posteriors are very wide, as you can see below.

I am just looking for some performance / efficiency tips, because the model is not performing well and since it’s also slow it makes it very hard to iterate.

(The model above is third order, I have also specified a first order model with only two parameters and for that model I was able to find a very accurate estimate.)

Can’t say for sure without a full runnable example, but a couple of things I noticed that you could try:

  • Splatting your parameter vectors (p = [lC..., lR..., lA...]) is going to be way slower than concatenating them (p = [lC; lR; lA]).
  • How many parameters are there total? If it’s more than a dozen or so, using a reverse-mode autodiff backend will probably give you a speedup over the default ForwardDiff.
  • You can always profile and check your Turing model for type-stability, see Performance Tips

From your plot, the chains look like they’re converging and mixing well, so if the posteriors are too wide, I’d guess its either something related to an insufficiently informative dataset, or something related to the specification of the model. How large is your dataset?

Hello,

Thank you for your help. This is my first Turing project so I am still getting to grips with everything.

It’s quite difficult to add a MWE, but here is the most important code from the notebook I am working in:

Code

###  imports
using ModelingToolkit
using ModelingToolkitStandardLibrary.Blocks
using ModelingToolkit: t_nounits as t, D_nounits as D

using DataInterpolations
using OrdinaryDiffEq
using DifferentialEquations
using Symbolics
using Plots
using Latexify
using LaTeXStrings

using LinearAlgebra
using Distributions
using Random
using Measures

### constants

# time-varying ambient temperature
T_zeroC = 273.15 # [K]
T_bias = T_zeroC + 20 # [K]

Random.seed!(42)
rng = MersenneTwister(1234);

# time at end of simulation
t_end = 100.0

Δt = 0.2
time = 0:Δt:t_end

### Ambient temperature

# sinusoidal function around the bias temperature
T_a(t) = T_bias #+ 10 * sin((π/16)*t)
T_a_data = T_a.(time)
plot(time, T_a_data, xlabel = "t [sec]", ylabel = L"T_a(t)", legend=false)

### Heat input signals

# heat pump control signal
values_hp = zeros(Int(t_end))
rand_hp = max.(repeat(300*randn(rng, Int(t_end/4)), 1, 4)'[:], 0)
values_hp[begin:length(rand_hp)] = rand_hp
Φ_hp(t) = max(t >= t_end ? values_hp[end] : values_hp[Int(floor(t)) + 1], 0)

# boiler control signal
values_cv = zeros(Int(t_end))
rand_cv = max.(repeat(600*randn(rng, Int(t_end/4)), 1, 4)'[:], 0)
values_cv[begin:length(rand_cv)] = rand_cv
Φ_cv(t) = max(t >= t_end ? values_cv[end] : values_cv[Int(floor(t)) + 1], 0)

# plot control signals
Φ_hp_data = Φ_hp.(time)
Φ_cv_data = Φ_cv.(time)

plot(time, Φ_hp_data, xlabel = "t [sec]", ylabel = L"Φ [W]", label=L"Φ_\mathrm{hp}(t)", legendfontsize=12)
plot!(time, Φ_cv_data, xlabel = "t [sec]", label=L"Φ_\mathrm{cv}(t)")

### Solar irradiance

# functions for the solar irradiance
# Φ_s(t) = 100 * sin((π / t_end) * t)
Φ_s(t) = 100
Φ_s_data = Φ_s.(time)

plot(time, Φ_s_data, xlabel = "t [sec]", ylabel = L"Φ_\mathrm{s}(t)", label=L"Φ_\mathrm{s}(t)", legendfontsize=12)

### Model parameters

# :name => [component_value, initial_condition]
C_i_true = 3.0
C_e_true = 2.0
C_h_true = 1.0
R_ie_true = 0.2
R_ea_true = 0.5
R_ih_true = 0.1
A_w_true = 3.0
A_e_true = 1.0

# true noise
σ_true = 5.0

params_d = Dict(
    :lC_i => log(C_i_true),
    :lC_e => log(C_e_true),
    :lC_h => log(C_h_true), 
    :lR_ie => log(R_ie_true), 
    :lR_ea => log(R_ea_true), 
    :lR_ih => log(R_ih_true),
    :lA_w => log(A_w_true),
    :lA_e => log(A_e_true),
)

### Model equations

using ModelingToolkit
using ModelingToolkit: t_nounits as t, D_nounits as D
using ModelingToolkitStandardLibrary.Blocks
using DataInterpolations
using OrdinaryDiffEq

function System(input_funcs::Dict, params::Dict; name)
    @named T_a_out = TimeVaryingFunction(input_funcs[:T_a])
    @named Φ_hp_out = TimeVaryingFunction(input_funcs[:Φ_hp])
    @named Φ_cv_out = TimeVaryingFunction(input_funcs[:Φ_cv])
    @named Φ_s_out = TimeVaryingFunction(input_funcs[:Φ_s])

    vars = @variables T_i(t) T_e(t) T_h(t) T_a(t) Φ_hp(t) Φ_cv(t) Φ_s(t) 
    pars = @parameters lC_i=params[:lC_i] lC_e=params[:lC_e] lC_h=params[:lC_h] lR_ie=params[:lR_ie] lR_ea=params[:lR_ea] lR_ih=params[:lR_ih] lA_w=params[:lA_w] lA_e=params[:lA_e]

    eqs = [  
        # inputs
        T_a ~ T_a_out.output.u
        Φ_hp ~ Φ_hp_out.output.u
        Φ_cv ~ Φ_cv_out.output.u
        Φ_s ~ Φ_s_out.output.u

        # dynamics
        exp(lC_i)*D(T_i) ~ exp(lA_w)*Φ_s + exp(-lR_ih)*(T_h - T_i) + exp(-lR_ie)*(T_e - T_i)
        exp(lC_e)*D(T_e) ~ exp(lA_e)*Φ_s + exp(-lR_ea)*(T_a - T_e) + exp(-lR_ie)*(T_i - T_e)
        exp(lC_h)*D(T_h) ~ Φ_cv + Φ_hp + exp(-lR_ih)*(T_i - T_h)
    ]

    ODESystem(eqs, t, vars, pars; systems = [T_a_out, Φ_hp_out, Φ_cv_out, Φ_s_out], name)
end

# input functions
T_a_linf = LinearInterpolation(T_a_data, time)
Φ_hp_linf = LinearInterpolation(Φ_hp_data, time)
Φ_cv_linf = LinearInterpolation(Φ_cv_data, time)
Φ_s_linf = LinearInterpolation(Φ_s_data, time)

input_funcs = Dict(
    :T_a => T_a_linf,
    :Φ_hp => Φ_hp_linf,
    :Φ_cv => Φ_cv_linf,
    :Φ_s => Φ_s_linf
)

@named system = System(input_funcs, params_d)
sys = structural_simplify(system)
prob = ODEProblem(sys, [T_bias, T_bias, T_bias], (0, time[end]))
sol = solve(prob, Tsit5());


### Discrete data generator

using DataFrames

function generate_data(time, noise::Float64, inputs::Dict, prob::ODEProblem, system::ODESystem)
    """
    Generate data for the RC Toy Model

    Args:
        time (StepRange): time vector
        noise (Float64): noise level 
        inputs (Dict): dictionary with input functions
        prob (ODEProblem): Problem statement

    Returns:
        output (DataFrame): DataFrame with columns keys(inputs) + prob.u
    """
    output = DataFrame()

    # get discrete inputs
    for (key, input_func) in inputs
        input_data = input_func.(time)
        output[!, key] = input_data
    end

    # get output variable names
    output_vars = Symbol.(replace.(string.(unknowns(sys)), r"\(t\)" => ""))
    
    d_sol = Array(solve(prob, Tsit5(); saveat=Float64(time.step)))
    d_sol_noisy = noise * randn(size(d_sol)) .+ d_sol
    d_sol_df = DataFrame(d_sol_noisy', output_vars)

    # merge df's and return
    return hcat(output, d_sol_df)
end

data = generate_data(time, σ_true, input_funcs, prob, sys)
first(data,5)

### Model prior parameters

using LinearAlgebra

n_states = length(unknowns(sys))
n_measured = 2

# identity matrices
IdT = Matrix(1.0I, n_states, n_states)
IdM = Matrix(1.0I, n_measured, n_measured)

# thermal capacitance prior
μ_C = [0 for _ in 1:n_states]
Σ_C = IdT * 1e0

# thermal resistance prior
μ_R = [-1 for _ in 1:n_states]
Σ_R = IdT * 1e0

# area prior
n_areas = 2
μ_A = [1 for _ in 1:n_areas]
Σ_A = IdM * 1e0

# measurement noise
ν_m = n_measured
Σ_m = IdM * 1e1
Λ_m = (1/n_measured) * inv(Σ_m)

# check that the prior is positive definite
@show isposdef(Σ_C)
@show isposdef(Σ_R)
@show isposdef(Σ_A)
@show isposdef(Σ_m)
@show isposdef(Λ_m)

# buid prior params dict
prior_params = Dict{Symbol,Any}(
    :μ_C => μ_C, 
    :Σ_C => Σ_C, 
    :μ_R => μ_R, 
    :Σ_R => Σ_R, 
    :μ_A => μ_A, 
    :Σ_A => Σ_A, 
    :ν_m => ν_m, 
    :Λ_m => Λ_m
)

### Inference with Turing

using Turing
using DifferentialEquations


@model function LGDS(x::AbstractArray, prob, params::Dict, t_int::Float64)

    # prior distributions
    σ ~ InverseGamma(2, 3)
    lC ~ MvNormal(params[:μ_C], params[:Σ_C])
    lR ~ MvNormal(params[:μ_R], params[:Σ_R])
    lA ~ MvNormal(params[:μ_A], params[:Σ_A])

    p = [lC; lR; lA]

    # we only save [T_i, T_h] because the real dataset won't have T_e
    T_ode = solve(prob, Tsit5(); p=p, saveat=t_int, save_idxs=[1,3], verbose=false)
    if length(T_ode) != length(x)
        return
    end

    # observations
    for i in 1:length(x)
        x[i] ~ MvNormal(T_ode[i], σ^2 * IdM)
    end

    return nothing
end

# instantiate model
input_data = Array(data[!, [:T_i, :T_h]])
model = LGDS(input_data, prob, prior_params, Δt)

# sample independent chains
n_chains = 5
chain = sample(model, NUTS(0.65), MCMCSerial(), 1000, n_chains; progress=true)

### Turing results

using StatsPlots

plot(chain)

cols = ["$(param)[$(i)]" for param in ["lC", "lR", "lA"] for i in 1:n_states][begin:end-1]

# vector to hold the sum(MSE) for each chain
error_chain = zeros(n_chains)
n_samples = 100

T_i_sample = data[!, :T_i]

for i in 1:n_chains

    # get the chain
    ch = chain[:, :, i]

    # get the posterior samples of C and R
    posterior_samples = DataFrame(sample(chain[:, :, 1], n_samples; replace=false))[!, cols]
    for p in eachrow(Array(posterior_samples))

        # solve the ODE with the posterior samples [C, R]
        sol_p = solve(prob, Tsit5(); p=p, saveat=Δt, save_idxs=1, verbose=false)
        if length(sol_p) != length(T_i_sample)
            # error_chain[i] = Inf
            continue
        end

        # calculate the normalized root mean squared error between sol_p and T_i_sample
        error_chain[i] += sum((sol_p .- T_i_sample).^2) / length(T_i_sample)
    end
end

chain_sel = argmin(error_chain)

println("Mean squared error for each chain: ", error_chain)
println("Chain with lowest error: ", chain_sel)

### Data retrodiction

plot(; legend=false)
posterior_samples = DataFrame(sample(chain[:, :, chain_sel], n_samples; replace=false))[!, cols]

nrmse = 0.0
# loop over posterior samples to plot the simulation
for p in eachrow(Array(posterior_samples))

    # solve the ODE model
    sol_p = solve(prob, Tsit5(); p=p, saveat=Δt, save_idxs=1, verbose=false)
    if length(sol_p) != length(T_i_sample)
        continue
    end

    # compute the NRMSE
    nrmse += sum((sol_p .- T_i_sample).^2) / length(T_i_sample)

    # plot the posterior samples
    plot!(sol_p; alpha=0.5, color="#BBBBBB")
end

# Plot simulation and noisy observations.
plot!(sol; idxs=[:T_i], size=(800,400), color=[1 2], linewidth=1, 
title="NRMSE: $(nrmse/100)", 
xlabel="t", 
ylabel="T_i", 
ylim=(minimum(T_i_sample)-5, 
maximum(T_i_sample)+5)
)
scatter!(0:4*Δt:t_end, T_i_sample[1:4:end], label="T_i (observed)", color=[1 2])

I can also share the full notebook if needed but the website here won’t let me upload .ipynb files.

Replying to your comments:

You are right, this seemed to help. What also helped speed it up is setting priors with low variance.

3 capacitors, 3 resistors and 2 areas so only 8 parameters in total. For context, the third-order RC network is supposed to model the dominant thermal dynamics of a house, the code above is toy model for which I am trying to estimate the known parameters with simulated data.

The posteriors have almost the same means as the priors, so it seems like there is no ‘learning’ being done at all, even when I set a large number of iterations. The means of the posterior are also too far off from the true values to be of any use. I simulated 100 seconds worth of data in ModelingToolkit.jl and I tried dt = 0.1, 0.2 and 0.5. So between 1000 and 200 samples. I also tried toning down the noise, but it doesn’t really have an effect.

I adapted the ODE model so that instead of finding C, R and A I try to estimate log(C), log(R) and log(A). From the first order toy model experiments I noticed this was much more numerically stable because it avoids the singularity at C=R=A=0. It also enforced the parameters to be positive and makes it possible to sample from standard normal distributions, so I think this works well.

All of this worked very well in the strongly simplified first order toy model, but I am unable to reproduce the good results with the third order toy model.

Any help is appreciated

Thanks, that does help clarify it some. For 8 parameters, the default ForwardDiff AD backend is probably the fastest.

Also, I think you have an error in your Turing model definition: the observation loop should be

    for i in 1:size(x,1)
        x[i, :] ~ MvNormal(T_ode[i], σ^2 * IdM)
    end

However, after some playing around, it seems like this model may just be poorly constrained/difficult to fit to the data. As you’ve observed, when the priors are tight, it samples quickly, but is just being guided by the priors. I can’t say anything intelligent about the system you’re modeling, but the fact that you could recover the parameters of a lower-order model would be consistent with that explanation…

Yes, that’s correct. I already fixed that in the code I shared above but Discourse wouldn’t let me edit the original post.

But I feel like there must be a way to make this work. It’s only 8 parameters.

NB: The class of models I am using was first proposed by Bacher and Madsen. There is an R package that implements this, which uses non-linear optimization to fit the parameters. I am trying to take the full Bayesian approach instead.

Ok, I think I figured it out! The issue is this block:

if length(T_ode) != length(x)
    return
end

It’s getting triggered at every sampling iteration, so the likelihood never actually gets evaluated. I think you just need to figure out how to ensure the ODE solution time points match the data array time points. I’m not sure why that isn’t happening already, but see if debugging that doesn’t solve the problem.

1 Like

This statement is needed because early on in the sampling process it may occur that that C and R values are sampled that lead to ‘unstable’ ODE solutions, and a bunch of warnings from DifferentialEquations.jl such as ‘itermax too low’ will show up. In those cases either a partial solution or no solution at all is returned and this is how I filtered those out.

I’m sure there is a better (ModelingToolkit.jl - esque) way to do this, but it seemed to work for now.

Anyway, I figured out something important in the meantime. I provided the model with data on both T_i and T_h. These quantities refer to specific temperature sensors in the system for which I have empirical data available.

I thought (and still think) that providing the model with two measured states would improve convergence, and it also makes sense to use all the data I have gather. However, for some reason that turns out not to be the case.

If I only provide T_i to the simulated toy model Turing is able to find very reasonable parameter estimates now:

But it’s still a mystery to me why the whole things breaks down if I provide both T_i and T_h. I also changed the observation model, now that T_\mathrm{ODE} is a scalar again:

    T_ode = solve(prob, Tsit5(); p=p, saveat=t_int, save_idxs=1, verbose=false)
    if length(T_ode) != length(x)
        return
    end

    # observations
    x ~ MvNormal(T_ode.u, σ^2 * I)

ps. really appreciate the discussion

Not following all details of your model, but DifferentialEquations has the nice feature that solutions are callable and interpolate values for the given time points if used in this fashion:

T_ode = solve(prob, Tsit5(); p=p) # No need to match exactly the time points of `x`, but saveat can be used as a hint
if !(SciMLBase.successful_retcode(T_ode.retcode))
   # Check convergence and reject if not
   return
end
tx = [0, t_int, 1]  # Time points corresponding to observations `x`
x_est = T_ode(tx)  # Gives interpolated values for these times
x ~ MvNormal(x_est, σ^2 * I)

This way, checking convergence and evaluating the solution are not linked. In particular, the solver is free to save additional points if needed for a stiff problem etc.

1 Like

Thanks, that seems to work fine.

I implemented it like this:

# timestep in seconds
Δt = 60

# stop simulation after n_days
n_days = 1
t_end = n_days * 60 * 60 * 24

time = 0:Δt:t_end

Model

@model function LGDS(x::AbstractArray, prob, params::Dict, t_int)

    # prior distributions
    σ ~ InverseGamma(2, 3)
    lC ~ MvNormal(params[:μ_C], params[:Σ_C])
    lR ~ MvNormal(params[:μ_R], params[:Σ_R])
    lA ~ MvNormal(params[:μ_A], params[:Σ_A])

    p = [lC; lR; lA]
   
    # remake the problem with the sampled parameter values
    prob_samp = remake(prob; p=p)

    T_ode = solve(prob_samp, Tsit5(); save_idxs=1, verbose=false)
    if !(SciMLBase.successful_retcode(T_ode.retcode))
       return
    end

    # time points corresponding to observations `x`
    x_est = T_ode(time)  
    x ~ MvNormal(x_est.u, σ^2 * I)    

    return nothing
end