Domain error for Turing model with callback


Following is a moderately large Turing model, where the ODE system is solved for 51 time steps with a discrete callback at t=20.


Section 1: Importing packages, files, and defining variables

using DifferentialEquations, Interpolations, XLSX, DataFrames, StatsPlots
using Distributions, DistributionsAD, MCMCChains, Memoization, Turing
using ReverseDiff
# Turing.setrdcache(true)
using Turing; Turing.setadbackend(:reversediff)
include("model_file.jl") # Loads the ODE model function our_model_v1!

tot_weeks = 51          # Number of time points
N = 1000000             # Parameter - population size
V = 2                   # Parameter - number of species
tspan = (1, tot_weeks)
tspan = Float64.(tspan) # Time period for simulation of ODEs

# The next two variables, IPTCC and mobil, are linearly interpolated forcing
# functions that are used in the model. They are read from two separate
# files, and the synthetic data was generated using the same variables.

# IPTCC is a forcing function
wet_data = DataFrame(XLSX.readtable("Wetdata.xlsx","Wetdata"; infer_eltypes = true)...)
IPTCC = wet_data.Normalized_IPTCC
IPTCC = IPTCC[1:tot_weeks]
IPTCC = Float64.(IPTCC)
IPTCC = interpolate(IPTCC, BSpline(Linear()))

# mobil is another forcing function
mobil_data = DataFrame(XLSX.readtable("Mobdata.xlsx","Mobdata"; infer_eltypes = true)...)
mobil = mobil_data.Mean
mobil = mobil[1:tot_weeks]
mobil = Float64.(mobil)
mobil = interpolate(mobil, BSpline(Linear()))

# Reading the observation data
synth_data = DataFrame(XLSX.readtable("synthData_version1.xlsx","Sheet1"; infer_eltypes = true)...)

Section 2: Define the observation model

@model function truth_data_fitting!(data, ODEtspan, num_species, tot_pop, interp_IPTCC, interp_mobil)

    param_prior_df = DataFrame(); # A dataframe container for parameters

    # Priors of parameters - ODE model with two species
    priorβ₁ ~ Uniform(0, 7)
    priorβ₂ ~ Uniform(0, 7)
    priorα ~ Uniform(1.4, 3.5)
    priorθₙ ~ Uniform(0.0008, 0.6993)
    priorθᵢ ~ Uniform(0.000001, 0.14)
    priorγₙ ~ Uniform(0.35, 0.7)
    priorγᵢ ~ Uniform(0.35, 0.7)
    priorγ ~ Uniform(0.8, 3.4965)
    priorλ ~ Uniform(7/1095, 7/730)
    priorδᵢ ~ Uniform(0.35, 0.7)

    # Priors of parameters - Observation model
    σ₁ ~ Gamma(1.0, 5.0) # For observation 1
    σ₂ ~ Gamma(1.0, 5.0) # For observation 2
    σ₃ ~ Gamma(1.0, 5.0) # For observation 3
    σ₄ ~ Gamma(1.0, 5.0) # For observation 4
    σ₅ ~ Gamma(1.0, 5.0) # For observation 5

    # Pushing the ODE model priors to the data frame
    param_prior_df.β = [priorβ₁; priorβ₂]
    param_prior_df.α = [priorα; priorα]
    param_prior_df.θₙ = [priorθₙ; priorθₙ]
    param_prior_df.θᵢ = [priorθᵢ; priorθᵢ]
    param_prior_df.γₙ = [priorγₙ; priorγₙ]
    param_prior_df.γᵢ = [priorγᵢ; priorγᵢ]
    param_prior_df.γ = [priorγ; priorγ]
    param_prior_df.λ = [priorλ; priorλ]
    param_prior_df.δᵢ = [priorδᵢ; priorδᵢ]

    # Final parameter container
    const_params = [num_species, tot_pop, interp_IPTCC, interp_mobil]
    final_parameters = [const_params; param_prior_df]

    # Defining the initial conditions and the problem
    I0₁ ~ Uniform(0.1, 100)
    I0₂ ~ Uniform(0.1, 100)
    u0 = zeros(eltype(I0₁), 5*num_species+5)
    u0[1] = tot_pop
    u0[4] = I0₁
    u0[12] = I0₁
    inference_problem = ODEProblem(our_model_v1!, u0, ODEtspan, final_parameters)

    # Defining the callback function to introduce change in species 2 at time point 20
    v2_seed_time = 20.0
    condition(u,t,integrator) = t==v2_seed_time
    function affect!(integrator)
        integrator.u[5] += I0₂  # For the model equation
        integrator.u[13] += I0₂ # For the cumulative counter
    cb = DiscreteCallback(condition,affect!;save_positions=(false,false))

    # Solve the model
    inference_solution = solve(inference_problem, Tsit5(), saveat = 1.0, callback = cb, tstops=[v2_seed_time])

    # Inference using multivariate normal distributions
    data[:,1] ~ MvNormal(inference_solution[12,:], σ₁*(sqrt.(inference_solution[12,:])))
    data[:,2] ~ MvNormal(inference_solution[13,:], σ₂*(sqrt.(inference_solution[13,:])))
    data[:,3] ~ MvNormal(inference_solution[14,:], σ₃*(sqrt.(inference_solution[14,:])))
    data[:,4] ~ MvNormal(inference_solution[15,:], σ₄*(sqrt.(inference_solution[15,:])))
    data[:,5] ~ MvNormal(inference_solution[11,:], σ₅*(sqrt.(inference_solution[11,:])))

Section 3: Run the inference procedure and save the chains

combo1_model = truth_data_fitting!(synth_data, tspan, V, N, IPTCC, mobil)
chains = sample(combo1_model, NUTS(0.65), 100; progress=true)

When simulating with Turing.setrdcache(true), it throws a Domain error for trying to estimate sqrt for a negative number. But my code doesn’t contain any branches or loops that run for different time periods.

But if I run without caching, the solver is perpetually stuck with the following error and never finds a time step.

┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, true, false, true)
└ @ AdvancedHMC ~/.julia/packages/AdvancedHMC/HQHnm/src/hamiltonian.jl:47

Can someone please, please help me out? Big thanks in advance.

The ODE solver itself does.

1 Like

Okay, I get it now. Thanks! I shall run without caching. But what is a possible reason for this never-ending warning?

Is your ODE divergence at the starting parameters?

No, my model is not divergent, even if I take out the ODE parameters and initial conditions out.

I did not completely understand your question about starting parameters as I am giving prior distribution and not an initial conditions. Nevertheless, this is an epidemic SIR-type model with fixed population size (parameter N) and the solver incorporates isoutofdomain option. Also, I myself generated the synthetic data and the priors of the parameters contain the true values. I hope it makes sense.

If you run the model not in Turing, is it okay? Are its derivatives okay?

1 Like

Yeah, it is absolutely fine.

  1. The model is of the form dyᵢ/dt = fᵢ(y₁, y₂, ...), where fᵢs are polynomials of integer powers in yᵢs. So, the derivatives are either constant (parameter values) times integer power of yᵢs or zero. An example equation:

here, δs are parameters while H and D are yᵢs. Every other ODE is a simple, mass-action-type, as the one mentioned above.

  1. I had generated data by running the model (not in Turing) and it is as expected. Everything is okay.

@sethaxen could it be the likelihood differentiation issue you were looking into? I forget what triggered that.

1 Like

What I meant was, did you check it with ForwardDiff?

Yes, I checked it with ForwardDiff as well. The same endless cycle of the warnings that I mentioned above: Warning: The current proposal will be rejected due to numerical error(s).

Just to be clear, I tried (a) commenting the below-mentioned lines (and I expect that the default ForwardDiff kicks in), as well as (b) explicitly added the using ForwardDiff and Turing.setadbackend(:forwarddiff). Both are giving the same warning.

Sounds like an HMC adaptivity issue, but I am not sure why.

Oh oh… :cry:

I don’t think it’s that specific one (that was due to a truncated Normal). It’s not immediately clear to me what the issue here is, but I do see one thing that should be changed.

    data[:,1] ~ MvNormal(inference_solution[12,:], σ₁*(sqrt.(inference_solution[12,:])))

These lines are using a deprecated constructor for MvNormal, where a vector of standard deviations are provided. Since MvNormal requires covariance matrices, these are then squared and wrapped in Diagonal to form a diagonal covariance matrix. Instead, you should now use

    data[:,1] ~ MvNormal(inference_solution[12,:], Diagonal(σ₁^2 * inference_solution[12,:])))

I suspect this could be the sqrt that is causing problems. Is inference_solution guaranteed to be non-negative? As has come up in a few places recently, sqrt(0) has unstable derivatives, and if the model and data are such that your inference_solution can ever be 0 (or less), then this could cause NaNs to propagate in the gradient.

It’s been recently noted in a few places for the sqrt(0) case that ForwardDiff only does the right thing when NaN-safe mode is enabled (not the default) using Preferences. See Advanced Usage Guide · ForwardDiff for details and for instructions for setting the preference. Then try sampling using ForwarDiff for the AD backend again and see if that helps at all with the numerical errors.

1 Like

Thanks a lot! I think you have hit the nail on its head. :slight_smile:

Yeah they are non-negative but could go to 0, and now I understand. But two minor questions:

  1. Which package provides me the function Diagonal? I read the documentation of MvNormal and found the PDiagMat of PDMats.jl package. When I use PDiagMat, I get the following continuously and it doesn’t stop spitting these out:

    So, I don’t think PDiagMat isn’t the right one, is it? FYI, this is how I have incorporated your recommendations at the beginning of the code:
using PDMats
using ForwardDiff, Preferences
set_preferences!(ForwardDiff, "nansafe_mode" => true)

and the line performing the inference:

data[:,1] ~ MvNormal(inference_solution[12,:], PDiagMat(σ₁^2*(inference_solution[12,:])))
  1. Can ReverseDiff handle the inference problem after I rewrite the MvNormal with diagonal covariances by using set_preferences!(ReverseDiff, "nansafe_mode" => true)? I tried to search the docs of ReverseDiff but couldn’t find it. Asking this because I read ReverseDiff handles models with large parameters more efficiently.

Following are the package versions, FYI:

Status `C:\Users\vembh\.julia\environments\v1.6\Project.toml`
  [a93c6f00] DataFrames v1.3.2
  [0c46a032] DifferentialEquations v7.1.0
  [31c24e10] Distributions v0.25.14
  [ced4e74d] DistributionsAD v0.6.29
  [f6369f11] ForwardDiff v0.10.25
  [a98d9a8b] Interpolations v0.13.5
  [c7f686f2] MCMCChains v4.14.1
  [6fafb56a] Memoization v0.1.13
  [90014a1f] PDMats v0.11.5
  [21216c6a] Preferences v1.2.3
  [fce5fe82] Turing v0.16.6

Thanks again!

Huh, those messages are strange. It shouldn’t make a difference, but try using Diagonal directly. It’s in the LinearAlgebra prackage.

Can you verify that the preference is set correctly using Preferences.load_preference(ForwardDiff, "nansafe_mode")? If not, create a new folder, activate an environment in the folder using using Pkg; Pkg.activate("folder_name"), then load ForwardDiff and Preferences, set the preference, restart Julia, re-activate the same environment, and check that the preference is set correctly. Then load the rest of the packages and retry.

I don’t think ReverseDiff has a NaN-safe mode. At the moment, we’re just trying to get stability, and ForwardDiff is generally the most robust AD. We can focus on performance later. That being said, I see 17 parameters, and forward-mode AD is often faster than reverse-mode up to hundreds of parameters.

1 Like

Running this command gave out true. So, I presume the preference is set correctly!

Using LinearAlgebra.jl eliminated that weird error, thankfully. Also, thank you for your clarification on ReverseDiff.

But unfortunately, the actual problem of the unending blizzard of Warning: The current proposal will be rejected due to numerical error(s). hasn’t gone away. :sleepy:

I’ve tried multiple times and in different ways. For example, I’ve switched from the MvNormal to Normal distributions in loops (the old naive way) for inferences and the sampler never finds a step size. Just the same problem!

FYI, following are the code files and the .xlsx file containing the observation data (<20 kB) if the error could be found from the actual code: all_files. Hope this helps in identifying the stupid error that I am making. :pensive:

Thanks, I’ll take a look at this tonight and let you know if I have any success.

1 Like

After looking into this a bit, a few notes:

  • At least for me, the type inference for XLSX.readtable did not work. I had to manually set the eltypes before creating the DataFrame
  • Constructing param_prior_df and then filling in the columns is type-unstable. Instead, it’s better to draw the parameters from the priors and then construct the DataFrame from those parameters. e.g.
param_prior_df = DataFrame(
    :β => [β₁, β₂],
    :α => fill(α, 2),
    :θₙ => fill(θₙ, 2),
    :θᵢ => fill(θᵢ, 2),
    :γₙ => fill(γₙ, 2),
    :γᵢ => fill(γᵢ, 2),
    :γ => fill(γ, 2),
    :λ => fill(λ, 2),
    :δᵢ => fill(δᵢ, 2),
  • Even without doing AD, just computing the log-density results in NaNs, e.g. here we use Turing’s mode estimation machinery to get a negative log density function in unconstrained space (where HMC samles) and an initial point drawn from the prior in that space.
julia> prob = optim_problem(combo1_model, MAP(); constrained=false);

julia> prob.prob.f(prob.prob.u0, nothing)
  • A key issue here is I think that inference_solution can be 0. MvNormal requires that the covariance matrix be positive definite, so that every entry of the diagonal must be greater than 0. Intuitively, MvNormal with diagonal covariance is equivalent to a product of Normals. As the standard deviation of Normal goes to 0, its density becomes a Dirac delta function, infinite if data is exactly equal to the mean and 0 everywhere else. This could cause -Inf and +Inf to be added together in the log-density, which causes the log-density to be NaN.
  • I worked around the above for now by adding 1e-3 to inference_solution in the diagonal of the covariances. I don’t recommend this approach though; instead I’d recommend thinking about whether these 0 covariances make sense.
  • After the above, I got non-NaN log-densities, but the numerical errors persisted, which could be due to bad posterior geometry or a bad initialization

To find a good initialization for sampling, we can use Pathfinder.jl. Multi-path Pathfinder constructs an approximation to the posterior from a mixture of multivariate normals:

] add Optim, Pathfinder#retry

julia> using Random, Pathfinder, Optim

julia> prob = optim_problem(combo1_model, MAP(); constrained=false);

julia> rng = MersenneTwister(42);

julia> optimizer = LBFGS(; m=6);

julia> x0s = [rand!(rng, Uniform(-2, 2), similar(prob.prob.u0)) for _ in 1:8];

julia> res = multipathfinder(
           x0s, 1_000;
[ Info: Optimized for 88 iterations (tries: 1). Maximum ELBO of -1742.7 reached at iteration 43.
[ Info: Optimized for 86 iterations (tries: 1). Maximum ELBO of -1740.86 reached at iteration 76.
[ Info: Optimized for 89 iterations (tries: 1). Maximum ELBO of -2350.17 reached at iteration 72.
[ Info: Optimized for 63 iterations (tries: 1). Maximum ELBO of -1597.53 reached at iteration 57.
[ Info: Optimized for 124 iterations (tries: 1). Maximum ELBO of -1656.22 reached at iteration 97.
[ Info: Optimized for 103 iterations (tries: 1). Maximum ELBO of -1710.69 reached at iteration 90.
[ Info: Optimized for 60 iterations (tries: 1). Maximum ELBO of -2343.33 reached at iteration 36.
[ Info: Optimized for 86 iterations (tries: 1). Maximum ELBO of -1718.11 reached at iteration 68.
┌ Warning: Pareto shape k = 3.3 > 1. Corresponding importance sampling estimates are likely to be unstable and are unlikely to converge with additional samples.
└ @ PSIS ~/.julia/packages/PSIS/ZPzVl/src/core.jl:262

Pathfinder was pretty slow here, which implies HMC will be much slower. PSIS.jl’s Pareto shape diagnostic indicates that Pathfinder’s approximation is very bad, which can indicate that Pathfinder failed, or for a model with as few dimensions as this, it can indicate the posterior can’t be well approximated with a multivariate normal (the best posterior geometry for most MCMC methods). Either way, we shouldn’t trust the samples much, but they can often are still useful for initializing HMC or for diagnosing any high-level issues.

julia> q, ϕ, logqϕ = res;

julia> x = mapslices(prob.transform, ϕ; dims=1);

julia> _vi =;

julia> ts = [Turing.Inference.Transition(DynamicPPL.tonamedtuple(_vi), DynamicPPL.getlogp(_vi))];

julia> varnames, _ = Turing.Inference._params_to_array(ts);

julia> chns_pathfinder
┌ Warning: timestamp of type Missing unknown
└ @ MCMCChains ~/.julia/packages/MCMCChains/ctWhw/src/chains.jl:364
┌ Warning: timestamp of type Missing unknown
└ @ MCMCChains ~/.julia/packages/MCMCChains/ctWhw/src/chains.jl:364
Chains MCMC chain (1000×17×1 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 1
Samples per chain = 1000
parameters        = β₁, β₂, α, θₙ, θᵢ, γₙ, γᵢ, γ, λ, δᵢ, σ₁, σ₂, σ₃, σ₄, σ₅, I0₁, I0₂

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat 
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64 

          β₁     6.9985    0.0000     0.0000    0.0000   1005.3697    0.9990
          β₂     6.9863    0.0005     0.0000    0.0000   1024.4776    1.0019
           α     1.4080    0.0004     0.0000    0.0000   1019.0961    1.0014
          θₙ     0.1022    0.0003     0.0000    0.0000   1012.2757    0.9992
          θᵢ     0.0053    0.0000     0.0000    0.0000   1055.3354    0.9999
          γₙ     0.5485    0.0010     0.0000    0.0000    893.3647    1.0009
          γᵢ     0.5160    0.0010     0.0000    0.0000    944.2894    0.9990
           γ     2.0548    0.0016     0.0001    0.0001    900.2580    1.0008
           λ     0.0091    0.0000     0.0000    0.0000   1127.4095    0.9991
          δᵢ     0.5145    0.0014     0.0000    0.0000   1048.8733    0.9990
          σ₁     8.4751    0.1971     0.0062    0.0072    884.1938    1.0009
          σ₂   141.9986    2.4318     0.0769    0.0608   1054.5526    0.9995
          σ₃     3.9532    0.0688     0.0022    0.0020    971.5128    0.9995
          σ₄     0.5181    0.0173     0.0005    0.0006    928.3820    1.0012
          σ₅     0.2112    0.0106     0.0003    0.0003   1052.2068    0.9992
         I0₁     6.4298    0.0395     0.0012    0.0012   1034.2994    1.0006
         I0₂    99.9999    0.0000     0.0000    0.0000    945.1577    1.0005

  parameters       2.5%      25.0%      50.0%      75.0%      97.5% 
      Symbol    Float64    Float64    Float64    Float64    Float64 

          β₁     6.9985     6.9985     6.9985     6.9985     6.9985
          β₂     6.9857     6.9857     6.9865     6.9867     6.9872
           α     1.4075     1.4075     1.4083     1.4083     1.4085
          θₙ     0.1014     0.1023     0.1023     0.1023     0.1024
          θᵢ     0.0053     0.0053     0.0053     0.0053     0.0054
          γₙ     0.5468     0.5480     0.5482     0.5496     0.5496
          γᵢ     0.5122     0.5161     0.5161     0.5168     0.5168
           γ     2.0530     2.0530     2.0550     2.0566     2.0566
           λ     0.0091     0.0091     0.0091     0.0091     0.0091
          δᵢ     0.5115     0.5149     0.5149     0.5149     0.5180
          σ₁     8.2256     8.3736     8.3736     8.7023     8.7023
          σ₂   139.1153   139.2721   143.1914   143.1914   145.7702
          σ₃     3.8563     3.9477     3.9477     3.9530     4.1870
          σ₄     0.4963     0.5011     0.5314     0.5361     0.5361
          σ₅     0.1982     0.2068     0.2084     0.2084     0.2336
         I0₁     6.3699     6.3906     6.4615     6.4615     6.4811
         I0₂    99.9999    99.9999    99.9999    99.9999    99.9999

The main thing that jumps out to me from Pathfinder’s draws is that many of the parameters are concentrated at the edge of their support. That may indicate that the model is strained, i.e. that the model (e.g. the choice of prior or likelihood) is incompatible with the data, so the model is stretching to try to satisfy the data. One way to check this would be to simulate a single dataset from this model (i.e. draw a single set of parameters from the prior and simulate a dataset using the rest of the model.) and then fit the model to that.

Also, σ₂ is suspiciously much higher than the other σ parameters.

Here’s how we can initialize HMC with one of Pathfinder’s draws:

julia> init_params = x[:, 1]
17-element Vector{Float64}:

julia> chns = sample(MersenneTwister(92), combo1_model, NUTS(0.65), 1_000; init_params)

I haven’t let this sample to completion (looks like it would take about half a day), but after warm-up, I’m getting no more numerical error. I did run from a different initialization last night and the resulting samples had not converged after 1000 draws (R-hat values were far from 1) and the transitions saturated the NUTS tree depth of 10, which indicates that it never U-turns, i.e. after an HMC transition of 1024 steps, the final point has not crossed the posterior (what NUTS is adapted to do). This usually indicates that the posterior geometry will be hard for HMC to sample, e.g. some parameters are highly correlated or the log density is heavy-tailed or “bumpy” without clear modes.

So in short, my advice at this stage is 1) think about whether the chosen likelihood with variances of 0 makes sense and 2) check the model by fitting it to data simulated from the same prior and likelihood.


Sorry for the late response @sethaxen as I was trying to implement your suggestions:

Your diagnosis was on point and I was able to solve the problem of the unending blizzard of Warning signs. A small sample of how the inference code is running now:

┌ Info: Found initial step size
└   ϵ = 0.2
┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, true, false, true)
└ @ AdvancedHMC ~/.julia/packages/AdvancedHMC/HQHnm/src/hamiltonian.jl:47
Sampling   0%|▎                                         |  ETA: 11:15:41
Sampling   1%|▍                                         |  ETA: 11:18:10
... # and goes on without any more errors

To summarize the solutions for your respective bullet points:

  1. Yes, the likelihood function was incorrect and I’ve it rectified by eliminating the possibilities of a 0 variance. This works for my problem as well. I converted
    data[:,1] ~ MvNormal(inference_solution[12,:], σ₁*(sqrt.(inference_solution[12,:]))) to
    data[:,1] ~ MvNormal(inference_solution[12,:], σ₁*(sqrt.(data[12,:])))
    and since data is never zero, I eliminated the possibility of zero variances. Nevertheless, biologically, my data does not follow a good normal distribution. So, using a TDist() instead of a MvNormal was even more efficiently accounting for heavy tails.
  2. I have fitted the model to the data simulated from the priors and the code is running error- and ‘almost’ warning-free. A big thanks for your help! :slight_smile:

But now, there is a small issue with the inference results which I am unable to solve: the mean value of the posterior samples is close to the real value, but the chain is not mixing and R-hat is above 1. For example, from a prior of β₁ Uniform(0, 7), the posterior ends up being the following:

The value of approx. 4.6 is very close to the real value, but the chain is not mixing well. R-hat is 1.4151 for this parameter. Surprisingly, the same is the case for other parameters: mixing is improper but the posterior looks like to have converged towards the true value.

What could be the reason and how to possibly rectify it? I have tried 1000-3000 burnin samples (the figure is of 1000 burnin) and yet it is not converging. If this chain mixing problem is solved, it will be of immense help to my project. I can share the codes (now they are error-free as mentioned earlier) if you’d wish to.

Big, big thanks again for the help! :slight_smile:

It seems likely that your posterior geometry is very hard to fit. Perhaps reparameterization of your model would help.

1 Like