Domain error for Turing model with callback

Okay, great! No divergences is good.

R-hats far from 1 when there are no divergences usually indicates the posterior geometry is hard to sample, i.e. multimodality (seems to be there from your posterior plot), high correlations, heavy tails, or even improper-ness. We can use draws with a bad R-hat to diagnose issues, but I wouldn’t put any stock at this point in any expectations computed from the posterior, like mean, even if they agree with what you would expect.

Before you try to fix this, it would be useful to know if this only happens with real data or also with simulated data. Have you tried simulating data using the prior and likelihood and then using it to fit the model? Does that sample fine?

As @dlakelan said, multimodality is best solved through reparameterization. This is model-specific and is not always possible but is worth looking into. The kind that is easiest to fix is the kind where the modes exist due to a permutation/translation symmetry between parameters. e.g. adding x to one parameter and subtracting it from another would produce the same log-density, or swapping two parameters would produce the same. Multimodality causes problems for sampling, yes, but it also makes interpretation of the posterior challenging due to non-identifiability. If reparameterization and all else fails, you could try sampling with replica exchange with parallel tempering. MCMCTempering.jl might be useful, but this won’t help with interpretation.

To check for high correlations, use pair/corner plots. High correlations likewise can often be addressed through reparameterizations. Sometimes using a dense metric with NUTS can help here, as it not only attempts to equalize the scale of parameters but also tries to decorrelate them for sampling.

Heavy tails are a bit more subtle. One way to check for them is to use the ArviZ.plot_energy function and look for an EBFMI less than 0.3 (bad). This usually accompanies tree depth being saturated sometimes (some transitions never u-turn because they’re stuck in the tails). Fixing this requires thinking some more about the sensibility of the priors and likelihood, though this isn’t necessarily a model problem.

An improper posterior is one with infinite probability mass, i.e. the chains will never converge. It happens, but I haven’t encountered it in practice so can’t give advice from experience here.

2 Likes

Thank you @sethaxen and @dlakelan for the responses. I’ve made two observations in my model from your suggestions, and I think you might find the root of the problem in either of them.

++++++++++++++++++++++++++++++++++++++++++++

I think my model has a swapping symmetry (from the explanation you gave) between two parameters and I’m explaining below why I think so. Can you please read and tell me if I am correct or not?

In my model, some of the ODEs (where I think the problem is) are:

# ODEs which govern the dynamics of my system
dI/dt = α*E - (γ + θ₁ + θ₂)*I # eq1
dH₁/dt = θ₁*I - γ₁*H₁         # eq2
dH₂/dt = θ₂*I - (γ₂+δ)*H₂     # eq3

# Dummy ODEs whose variables are used to perform the inference
dIₘₑₐₛ/dt = α*E               # eq4
dH₁ₘₑₐₛ/dt = θ₁*I             # eq5
dH₂ₘₑₐₛ/dt = (θ₁+θ₂)*I        # eq6

Basically, the Iₘₑₐₛ, H₁ₘₑₐₛ, and H₂ₘₑₐₛ are the ones that are kept track of and go into the likelihood function for inference. eq1 - eq3 is the only place where θs come in the whole model, and in eq1, they come as a sum. Moreover, priors for γ₁, γ₂, and δ are the same because of biology. Now:

  1. both eq5 and eq6 are used for inference with one of them having the combination θ₁+θ₂,
  2. eq1, which governs the actual dynamics, also has the combination θ₁+θ₂, and
  3. both H₁ₘₑₐₛ, and H₂ₘₑₐₛ are used for inference.

I have a strong feeling that the swap symmetry problem is arising between θ₁ and θ₂ from this specific set of equations because of the features I’ve mentioned. I think multiple solutions are possible for θs, γs, and δ because of this, causing multimodalities in the posterior. By the way, I do not have any other ODEs which have this feature. Does it make sense? Or is my observation wrong? If it makes sense, what are potential workaround(s)?

++++++++++++++++++++++++++++++++++++++++++++

The data that I am using to perform inference is generated from the exact same model - in other words, this is synthetic data and not real-world data. Moreover, the priors that I have considered for the parameters do contain the actual values using which I generated the synthetic data, and to the thus generated data, I added noise. Is doing once (assuming the generation of synthetic data is one such attempt) enough to identify the issue, or should I do it multiple times? The model has 9 parameters, FYI.

++++++++++++++++++++++++++++++++++++++++++++

Thanks again! :slight_smile: I think I am very close to solving my problem.

Perform an identifiability analysis:

https://mtk.sciml.ai/dev/tutorials/parameter_identifiability/

1 Like

I don’t understand, how do H_1 and H_2 enter into all this? The (y2+Delta) is certainly not individually identifiable and even if you can identify (theta_1+theta_2) you won’t identify (y+theta_1+theta_2)

All the parameters in my model are globally identifiable, as per StructuralIdentifiability.jl’s algorithm. SIAN.jl code is still running. Following are the results:

[ Info: Global identifiability assessed in 24.004288386 seconds
Dict{Sym{Real, Base.ImmutableDict{DataType, Any}}, Symbol} with 11 entries:
  alpha   => :globally
  theta2  => :globally
  lambda1 => :globally
  gamma   => :globally
  delta   => :globally
  lambda  => :globally
  gamma1  => :globally
  beta    => :globally
  theta1  => :globally
  gamma2  => :globally
  lambda2 => :globally

I had put up only six of the 12 ODEs, where H_1 and H_2 are two of the dependent variables in my system of ODEs. Nevertheless, it looks like all my parameters are identifiable! I do know that structural identifiability does not guarantee parameter inference, but the error is perhaps not in my model.

1 Like

So identifiability is good, but as you say it doesn’t guarantee you can sample. For example there is the “funnel” issue where a parameter controls the scale of some other parameter. As the first parameter gets smaller, the second parameter requires tiny steps to sample. There is also the possibility of wide ranging scales and strong correlations, or multimodality not caused by things like symmetries.

1 Like

It doesn’t guarantee it because you could still have issues of low data. Structural identifiability just means that, if you have complete data on the time series for the observables you’ve stated (i.e. a value at all t for all of the chosen observables), then you have a globally unique optimal parameter. That said, if you only have a few measurements at t, it might still be tough to discern some parameters (this would have to be assessed via practical identifiability analysis), and you’d have to make sure the fitting is numerically easy. However, it’s a strong indicator that this is in principle solvable.

Try decreasing the solver tolerances. Constricting the derivatives usually helps inference.

1 Like

At first glance, I don’t see strong evidence for swapping symmetry here. Yes, θ₁ and θ₂ appear added together, but they also appear affecting the dH₁/dt and dH₂/dt terms in different ways. Unfortunately, resolving multimodality when the model includes ODEs is beyond my expertise to help with.

1 Like

That kind of multimodality, if it affected identifiability, would be caught in structural identifiability analysis. Given it’s globally identifiable, that wouldn’t be an issue and indeed the asymmetry is enough to break it

1 Like

Chris, not being familiar with what the structural identifiability analysis is doing, what would it say if you have one mode with a tall peak and one mode with a shorter peak?

With infinite data points along the time course there would be no uncertainty and there would be one global peak left.

Perhaps in the absence of measurement error.

Structural identifiability analysis assume noise-free measurements. But another way to think about it is that If you had infinite data and your measurement error is unbiased, then your mean estimate is noise-free with 0 variance.

1 Like