Turing.jl: fix some parameters to prior distribution

I’m testing out Turing.jl and I’m wondering if I can specify a subset of parameters to remain fixed at their prior distribution. I would still like to sample these parameters to get a joint distribution with the other parameters that I do want the posterior for.

For example, say I have the following model. Can I specify β to remain at its prior distribution?

@model bayes_sir(y) = begin
    # Calculate number of timepoints
    l = length(y)
    i₀  ~ Uniform(0.0,1.0)
    β ~ Uniform(0.0,1.0)
    I = i₀*1000.0
    u0=[1000.0-I,I,0.0,0.0]
    p=[β,10.0,0.25]
    tspan = (0.0,float(l))
    prob = ODEProblem(sir_ode!,
            u0,
            tspan,
            p)
    sol = solve(prob,
                Tsit5(),
                saveat = 1.0)
    sol_C = Array(sol)[4,:] # Cumulative cases
    sol_X = sol_C[2:end] - sol_C[1:(end-1)]
    l = length(y)
    for i in 1:l
      y[i] ~ Poisson(sol_X[i])
    end
end;
1 Like

I don’t know the answer, but you mean something like the cut function in WinBUGS?

That looks similar but perhaps my goal is simpler. I don’t want to update the distribution of a subsets of parameters at all, not even using a subset of the data.

Every ~ indicates that we perform inference for that parameter and that we account for it in the joint probability. But you can simply draw beta using the rand function and add its log probability to the joint.

Does that help?
May I also ask why you don’t want to update the prior?

You can use the approach outlined here Is there a turing alternative to pm.Deterministic from pymc3? - #3 by torfjelde to keep track of beta.

You’re trying to do inference on a subset of the parameters? Like if you have parameters a,b you want to keep the distribution of a as the prior, but do inference on b?

You could do this by sampling from the “fixed” parameters, then for each of those fixed sample values, run inference on b conditional on a remaining equal to the sample value… When you pool all the runs, the marginal distribution of a would still be the prior, but the b would be according to p(b | a, data) for each sample value of a.

Great, so something like this:

@model bayes_sir(y) = begin
    # Calculate number of timepoints
    l = length(y)
    i₀  ~ Uniform(0.0,1.0)
    β = rand(Uniform(0.0,1.0))
    I = i₀*1000.0
    u0=[1000.0-I,I,0.0,0.0]
    p=[β,10.0,0.25]
    tspan = (0.0,float(l))
    prob = ODEProblem(sir_ode!,
            u0,
            tspan,
            p)
    sol = solve(prob,
                Tsit5(),
                saveat = 1.0)
    sol_C = Array(sol)[4,:] # Cumulative cases
    sol_X = sol_C[2:end] - sol_C[1:(end-1)]
    l = length(y)
    for i in 1:l
      y[i] ~ Poisson(sol_X[i])
    end
end;

and then use the trick from Is there a turing alternative to pm.Deterministic from pymc3? - #3 by torfjelde to keep track of the samples from beta?

There are a couple of applications I had in mind:

  1. For computational efficiency, when I know that a parameter has no influence on the likelihood.

  2. When the prior was estimated using some external source that I think is more reliable than my own data. For example, say I observe deaths in patients from treatments A and B. A clinical trial reports a hazard rate ratio (hr) for the effectiveness of A vs. B. I assume the likelihood is: deaths_a ~ Poisson(lambda * hr) and deaths_b ~ Poisson(lambda). I want to run inference on lambda but I want to leave hr unchanged from its prior.

I think we should be able to formally support this use case by allowing the Prior inference algorithm to be a GibbsComponent. Then you can use Gibbs to specify the splitting.

I don’t think that’s a good example. What you’d do here is use the prior information from the previous reports, to provide a very tight prior, and then you’d include the hr in your parameter set and do inference on it. If there’s not much information in your data, it won’t change much from the prior, on the other hand if there is information in your data set, it will change from the prior, and that’s appropriate. Basically a tight prior from a previous study is still just a prior not a set-in-stone fact (unless the value is very tight, in which case you might as well just enter a number directly into your model)

I agree with @dlakelan. If you have strong prior assumptions, it is better to incorporate these into the generative process using a tight prior or logical constraints.

1 Like

There could be a wide prior for hr estimated from the clinical trial but the clinical trial is an unbiased estimate of hr because patients were randomized to treatment arms. My own, observational data may have some unknown selection effects so I want to build a model that constrains hr to be consistent with the results of the clinical trial, and not let my own data influence hr. I think this idea is similar to the reasons discussed here http://statistik.wu-wien.ac.at/research/talks/resources/Plummer_WU_2015.pdf except that the data Z is not available and I only have access to the distribution for φ already estimated using Z by somebody else.

I don’t think this approach will work because inference on beta still occurs. For example:

@model bayes_sir(y) = begin
    # Calculate number of timepoints
    l = length(y)
    i₀  ~ Uniform(0.0,1.0)
    # β ~ Uniform(0.0,1.0)
    β ~ Determin(rand(Uniform(0.0, 1.0)))
    I = i₀*1000.0
    u0=[1000.0-I,I,0.0,0.0]
    p=[β,10.0,0.25]
    tspan = (0.0,float(l))
    prob = ODEProblem(sir_ode!,
            u0,
            tspan,
            p)
    sol = solve(prob,
                Tsit5(),
                saveat = 1.0)
    sol_C = Array(sol)[4,:] # Cumulative cases
    sol_X = sol_C[2:end] - sol_C[1:(end-1)]
    l = length(y)
    for i in 1:l
      y[i] ~ Poisson(sol_X[i])
    end
end;

ode_smc = sample(bayes_sir(generate_data(1234)), SMC(), 10000);

returns

Quantiles
  parameters    2.5%   25.0%   50.0%   75.0%   97.5%
  ──────────  ──────  ──────  ──────  ──────  ──────
          i₀  0.0107  0.0107  0.0107  0.0192  0.0192
           β  0.0435  0.0435  0.0501  0.0501  0.0514

So beta is no longer at its prior distribution. I guess I would need to do something like the 2-stage process @dlakelan described:

I imagine this would be very computationally expensive if I implemented myself because I would have to re-run the Turing sampler for each sample of beta.

I think the issue is that you’re building the wrong kind of model.

If you think the clinical trial gave a good unbiased estimate of a distribution for beta, and that your data has a bias… Then include a parameter in your model that represents the bias. Instead of using beta to describe your data, use beta + bias.

Another way of saying this is “partial pooling” or “hierarchical modeling”.

beta ~ my_clinical_prior
mystudybeta ~ my_bias_size(beta)

mydata ~ my_likelihood(mystudybeta)

This is entirely the same concept, except it is also easier to incorporate logical constraints if you formulate it this way.

Okay, but I’m still interested to know how I could write code to force beta to stay at its prior.

I guess as @mohamed82008 said, this could be achieved by something like:

chn = sample(bayes_sir(generate_data(1234)), Gibbs(NUTS(-1, 0.65, :i₀), Prior(:β)), 1000)

Think about what it means for Beta to stay at its prior distribution with respect to proper Bayesian inference. It means that the likelihood function is a constant with respect to Beta. And this in turn means that Beta doesn’t inform you about what your data might be.

All you have to do is simply don’t use Beta :wink:

As the distribution for the biased beta tends towards infinitely wide… The distribution for the unbiased Beta tends towards the prior.

There is no way to do actual Bayesian inference where you use the distribution for Beta and you don’t distort it in the posterior. If you make the bias infinitely wide, you don’t use the Beta, if you make the bias less than infinitely wide you use the Beta information, but you distort the Beta somewhat. It’s a continuum of possibilities. Simply choose that place on the continuum where you’re happy.

Think about it like this… you have three balls connected by springs. The ball on the right is your data, C, the ball in the middle is your biased parameter, B, and the ball on the left is your clinical Beta, A. Like this:

A ---- B ----- C

If you make the spring between A,B very very soft… then if you pull on C to put it where the real data lies… you’ll bring B along with you, but A will stay behind because B isn’t pulling on it very much…

If you tighten up the spring between A, B you’ll pull A along with B somewhat. If you make A,B connected by a rigid rod with zero length, A will come along and always be exactly where B is.

The model you started with is where A,B are connected by a rigid rod with zero length… By partially decoupling A,B using a “soft spring” (namely a prior on B that is wide and centered around the location of A) then A will remain mainly where it was. If you completely decouple by providing some other prior for B that doesn’t involve A at all… then you’ll sample from the clinical prior for A… but it won’t affect your choice of B at all.