Turing.jl: prior on quantiles

I’d like to develop a model with prior on quantiles of the outcome, rather than on the parameters themselves. This is pretty commonly used in expert elicitation contexts. However, when I try to implement this model, I’m clearly not getting the right thing. I’m wondering whether I need to add a Jacobian?

Here’s a MWE

using Turing, Distributions

@model Demo(y) = begin

    # define the parameters as vaguely as possible -- 
    μ ~ Normal(0, 100)
    σ ~ Truncated(Normal(0, 10), 0, Inf)

    # we're going to use this distribution repeatedly
    dist = Normal(μ, σ)

    # define quantiles -- these are deterministic
    q50 = quantile(dist, 0.50)
    q90 = quantile(dist, 0.90)

    # prior distribution on quantiles - pretend this is "from experts"
    q50 ~ Normal(5, 0.1) # our prior on the median is about 5
    q90 ~ Normal(10, 0.1) # our prior on the 90th percentile is about 10

    # data model
    y .~ dist
end

y = [0]
model = Demo(y)

prior = sample(model, Prior(), 10_000)
yhat_prior = rand.(Normal.(prior[:μ], prior[:σ]))[:]
quantile(yhat_prior, 0.50) # returns about 0
quantile(yhat_prior, 0.90) # returns about 130

Thinking that maybe the problem was the Prior(), I re-wrote it as

using Turing, Distributions, DynamicHMC


@model Demo() = begin

    # define the parameters as vaguely as possible
    μ ~ Normal(0, 100)
    σ ~ Truncated(Normal(0, 10), 0, Inf)

    # we're going to use this distribution repeatedly
    dist = Normal(μ, σ)

    # define quantiles -- these are deterministic
    q50 = quantile(dist, 0.50)
    q90 = quantile(dist, 0.90)

    # prior distribution on quantiles - pretend this is "from experts"
    q50 ~ Normal(5, 0.1) # our prior on the median is about 5
    q90 ~ Normal(10, 0.1) # our prior on the 90th percentile is about 10
end

prior = sample(Demo(), DynamicNUTS(), 10_000)
yhat_prior = rand.(Normal.(prior[:μ], prior[:σ]))[:]
quantile(yhat_prior, 0.50)
quantile(yhat_prior, 0.90)

but get the same results

Hey! This issue shows up because Turing.jl decides what’s an observation and what’s a latent variable by matching variables to the arguments of the model.

This means that anything NOT in the arguments of the model cannot be observed, i.e. your q50 and q90. In cases where you really want to do this, you can make use of the DynamicPPL.@addlogprob! macro, e.g.:

@model Demo(y) = begin
    # define the parameters as vaguely as possible -- 
    μ ~ Normal(0, 100)
    σ ~ Truncated(Normal(0, 10), 0, Inf)

    # we're going to use this distribution repeatedly
    dist = Normal(μ, σ)

    # define quantiles -- these are deterministic
    q50 = quantile(dist, 0.50)
    q90 = quantile(dist, 0.90)

    # prior distribution on quantiles - pretend this is "from experts"
    DynamicPPL.@addlogprob! logpdf(Normal(5, 0.1), q50) # our prior on the median is about 5
    DynamicPPL.@addlogprob! logpdf(Normal(5, 0.1), q90) # our prior on the 90th percentile is about 10

    # data model
    y .~ dist
end

And just some general comments on writing models that you might find useful:

  • You can do @model function Demo(y) ... end instead of the begin ... end notation. You might have come across this because back in the day the function syntax wasn’t possible, but now that it is, that’s the preferable way:) Functionally equivalent, it’s just a matter of style.
  • Use truncated instead of Truncated. This is not related to Turing.jl btw, it’s just what’s recommended by Distributions.jl :+1: It does some additional work to ensure that types are consisent.

EDIT: We actually very recently made some changes to the “compiler” used in Turing.jl which allows us to introduce a @observe macro that would give the user a way of “forcing” an observation, irregardless of whether or not it’s in the arguments of the model, e.g. you could do

@observe q50 ~ Normal(dist, 0.50)

Is this something that would be desirable? If there’s interest in someting like that, we might just implement it:)

4 Likes

Dear Tor,

Thanks very much for the quick response on a Saturday night! I’m more familiar with stan, which you have to tell explicitly which variables are observed or parameters, so I really appreciate the clarification. It looks like the DynamicPPL.@addlogprob! function is quite similar to stan’s target +=. Can I assume that Turing takes care of any needed chain rule/Jacobian adjustments? (Certainly my toy case here is working like this).

Thanks also for the two tips. I’ll keep those both in mind!

I think that the @observe macro would be quite interesting. I don’t know what other use cases might be, but being able to encode prior distributions on quantities that are derived from parameters – rather than on the parameters themselves – would certainly be useful to me.

For the sake of anyone looking through this in the future, this works:

using Turing, Distributions, DynamicHMC
using DynamicPPL: @addlogprob!


@model function Demo()

    # define the parameters as vaguely as possible
    μ ~ Normal(0, 100)
    σ ~ truncated(Normal(0, 10), 0, Inf)

    # we're going to use this distribution repeatedly
    dist = Normal(μ, σ)

    # define quantiles -- these are deterministic
    q50 = quantile(dist, 0.50)
    q90 = quantile(dist, 0.90)

    # prior distribution on quantiles - pretend this is "from experts"
    DynamicPPL.@addlogprob! logpdf(Normal(3, 0.1), q50) # our prior on the median is about 5
    DynamicPPL.@addlogprob! logpdf(Normal(10, 0.1), q90) # our prior on the 90th percentile is about 10
end

prior = sample(Demo(), DynamicNUTS(), 100_000)
yhat_prior = rand.(Normal.(prior[:μ], prior[:σ]))[:]
quantile(yhat_prior, 0.50) # 2.9986086587820795
quantile(yhat_prior, 0.90) # 9.982226223019403

but this formulation does not

@model function Demo(y)

    # define the parameters as vaguely as possible
    μ ~ Normal(0, 100)
    σ ~ truncated(Normal(0, 10), 0, Inf)

    # we're going to use this distribution repeatedly
    dist = Normal(μ, σ)

    # define quantiles -- these are deterministic
    q50 = quantile(dist, 0.50)
    q90 = quantile(dist, 0.90)

    # prior distribution on quantiles - pretend this is "from experts"
    DynamicPPL.@addlogprob! logpdf(Normal(3, 0.1), q50) # our prior on the median is about 5
    DynamicPPL.@addlogprob! logpdf(Normal(10, 0.1), q90) # our prior on the 90th percentile is about 10

    y .~ dist
end

y = [0]
prior = sample(Demo(y), Prior(), 100_000)
yhat_prior = rand.(Normal.(prior[:μ], prior[:σ]))[:]
quantile(yhat_prior, 0.50)
quantile(yhat_prior, 0.90)

which (I assume?) has to do with Turing not treating the DynamicPPL.@addlogprob! as part of the prior.

Glad that it helped:)

Can I assume that Turing takes care of any needed chain rule/Jacobian adjustments? (Certainly my toy case here is working like this).

In general yes, but not in the case when you do addlogprob!. When you use addlogprob!, Turing assumes the user knows what they’re doing and leaves the RHS as is. So μ and σ will internally be transformed and the logjoint will be adjusted accordingly, but in the model-body you will only see μ and σ in their original space despite the sampler working with the unconstrained variables, e.g. HMC. But “conditionally deterministic” variables such as q50 and q90 the sampler doesn’t care about; all that matters is that these affect the logjoint. And so there’s no need to transform q50 and q90 to unconstrained space, and thus no need to deal with jacobian adjustments to the jointdistribution.

Does this make sense?

TL;DR: jacobian adjustments are performed automatically for latent variables when the sampler requires it, e.g. HMC, but the sampler doens’t care about conditionally deterministic variables and so AFAIK there should never be a case where this is something to think about when using addlogprob!.

which (I assume?) has to do with Turing not treating the DynamicPPL.@addlogprob! as part of the prior.

What do you mean “doesn’t work” here? You get different quantiles from the above model which is supposed to be the prior?

If so, yeah that won’t work. Prior is a simple sampler that samples from the right-hand sides of ~ statements. As soon as you want adjust the logdensity using some other quantities, e.g. your q50 and q90, then your back to the issue that MCMC attempts to solve:) But it’s a good point though, and maybe we should add an example in our docs talking a bit more about this.

But it’s actually possible to sample from the model prior using MCMC methods, though it’s a bit intricate. This is essentially writing the sample from AbstractMCMC.jl by hand (see the README for a bit more information) + using some convenience methods provided by DynamicPPL.jl.

julia> using Turing

julia> rng = Random.MersenneTwister(42);

julia> y = [0.0];

julia> m = Demo(y);

julia> alg = NUTS();

julia> # `spl` is a wrapper from DynamicPPL.jl that inherits a bunch of convenient impls.
       spl = DynamicPPL.Sampler(alg);

julia> spl_initial = DynamicPPL.initialsampler(spl) # Sampler used to initialize `spl`
DynamicPPL.SampleFromUniform()

julia> # Get the initial state using `spl_initial` and specify that we're only interested in
       # the prior by providing the `PriorContext()`.
       initial_state = DynamicPPL.VarInfo(rng, m, spl_initial, DynamicPPL.PriorContext())

julia> # Get the first `transition` and `state` for `spl`.
       transition, state = DynamicPPL.initialstep(rng, m, spl, var_info);

julia> # Set up a container for the samples using `transition` as the "template" for how samples look.
       samples = AbstractMCMC.samples(transition, m, spl)
Turing.Inference.HMCTransition{NamedTuple{(:μ, :σ), Tuple{Tuple{Vector{Float64}, Vector{String}}, Tuple{Vector{Float64}, Vector{String}}}}, NamedTuple{(:n_steps, :is_accept, :acceptance_rate, :log_density, :hamiltonian_energy, :hamiltonian_energy_error, :max_hamiltonian_energy_error, :tree_depth, :numerical_error, :step_size, :nom_step_size), Tuple{Int64, Bool, Float64, Float64, Float64, Float64, Float64, Int64, Bool, Float64, Float64}}, Float64}[]

julia> # Sample! Notice that we update the `state` at every iteration and pass the previous `state` to `step`.
       for i = 1:10_000
           sample, state = AbstractMCMC.step(rng, m, spl, state)
           AbstractMCMC.save!!(samples, sample, i, m, spl)
       end

julia> # Since `samples` is a simple array of samples we instead bundle those into a `MCMCChains.Chains`.
       prior = AbstractMCMC.bundle_samples(samples, m, spl, state, MCMCChains.Chains)
Chains MCMC chain (10000×14×1 Array{Float64, 3}):

Iterations        = 1:10000
Thinning interval = 1
Chains            = 1
Samples per chain = 10000
parameters        = μ, σ
internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth

Summary Statistics
  parameters      mean       std   naive_se      mcse         ess      rhat 
      Symbol   Float64   Float64    Float64   Float64     Float64   Float64 

           μ    2.9987    0.0997     0.0010    0.0023   1699.0814    1.0013
           σ    5.4623    0.1076     0.0011    0.0019   2807.4221    1.0005

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           μ    2.7983    2.9320    3.0014    3.0693    3.1890
           σ    5.2567    5.3890    5.4608    5.5351    5.6749

julia> yhat_prior = rand.(Normal.(prior[:μ], prior[:σ]))[:];

julia> quantile(yhat_prior, 0.50) # ✓
3.018120334070474

julia> quantile(yhat_prior, 0.90) # ✓
10.157027708693064

This is really touching internals though, so not certain if you want to rely on this. This is as much for me as for you, as I wanted to see if we could provide this in a neat format :sweat_smile: Thinking that maybe we should provide a convenient way of saying “only sample the prior, but sample it using alg instead of Prior()”.

EDIT: I made a issue regarding this btw: Sample from prior using inference algorithm · Issue #1591 · TuringLang/Turing.jl · GitHub

1 Like
  1. (RE Jacobian) yes, this makes sense - thanks.
  2. (RE doesn’t work) – yes, that’s what I mean. Again, it makes sense that Turing doesn’t think the @addlogprob! lines are part of “prior”.
  3. (RE implementation) looks interesting! Glad to be of service testing things out on my real data which is (only slightly) more complicated than this contrived Normal example.
1 Like