Turing.jl: prior on quantiles

jdossgollin · April 18, 2021, 1:29am

I’d like to develop a model with prior on quantiles of the outcome, rather than on the parameters themselves. This is pretty commonly used in expert elicitation contexts. However, when I try to implement this model, I’m clearly not getting the right thing. I’m wondering whether I need to add a Jacobian?

Here’s a MWE

using Turing, Distributions

@model Demo(y) = begin

    # define the parameters as vaguely as possible -- 
    μ ~ Normal(0, 100)
    σ ~ Truncated(Normal(0, 10), 0, Inf)

    # we're going to use this distribution repeatedly
    dist = Normal(μ, σ)

    # define quantiles -- these are deterministic
    q50 = quantile(dist, 0.50)
    q90 = quantile(dist, 0.90)

    # prior distribution on quantiles - pretend this is "from experts"
    q50 ~ Normal(5, 0.1) # our prior on the median is about 5
    q90 ~ Normal(10, 0.1) # our prior on the 90th percentile is about 10

    # data model
    y .~ dist
end

y = [0]
model = Demo(y)

prior = sample(model, Prior(), 10_000)
yhat_prior = rand.(Normal.(prior[:μ], prior[:σ]))[:]
quantile(yhat_prior, 0.50) # returns about 0
quantile(yhat_prior, 0.90) # returns about 130

Thinking that maybe the problem was the Prior(), I re-wrote it as

using Turing, Distributions, DynamicHMC


@model Demo() = begin

    # define the parameters as vaguely as possible
    μ ~ Normal(0, 100)
    σ ~ Truncated(Normal(0, 10), 0, Inf)

    # we're going to use this distribution repeatedly
    dist = Normal(μ, σ)

    # define quantiles -- these are deterministic
    q50 = quantile(dist, 0.50)
    q90 = quantile(dist, 0.90)

    # prior distribution on quantiles - pretend this is "from experts"
    q50 ~ Normal(5, 0.1) # our prior on the median is about 5
    q90 ~ Normal(10, 0.1) # our prior on the 90th percentile is about 10
end

prior = sample(Demo(), DynamicNUTS(), 10_000)
yhat_prior = rand.(Normal.(prior[:μ], prior[:σ]))[:]
quantile(yhat_prior, 0.50)
quantile(yhat_prior, 0.90)

but get the same results

torfjelde · April 18, 2021, 1:41am

Hey! This issue shows up because Turing.jl decides what’s an observation and what’s a latent variable by matching variables to the arguments of the model.

This means that anything NOT in the arguments of the model cannot be observed, i.e. your q50 and q90. In cases where you really want to do this, you can make use of the DynamicPPL.@addlogprob! macro, e.g.:

@model Demo(y) = begin
    # define the parameters as vaguely as possible -- 
    μ ~ Normal(0, 100)
    σ ~ Truncated(Normal(0, 10), 0, Inf)

    # we're going to use this distribution repeatedly
    dist = Normal(μ, σ)

    # define quantiles -- these are deterministic
    q50 = quantile(dist, 0.50)
    q90 = quantile(dist, 0.90)

    # prior distribution on quantiles - pretend this is "from experts"
    DynamicPPL.@addlogprob! logpdf(Normal(5, 0.1), q50) # our prior on the median is about 5
    DynamicPPL.@addlogprob! logpdf(Normal(5, 0.1), q90) # our prior on the 90th percentile is about 10

    # data model
    y .~ dist
end

And just some general comments on writing models that you might find useful:

You can do @model function Demo(y) ... end instead of the begin ... end notation. You might have come across this because back in the day the function syntax wasn’t possible, but now that it is, that’s the preferable way:) Functionally equivalent, it’s just a matter of style.
Use truncated instead of Truncated. This is not related to Turing.jl btw, it’s just what’s recommended by Distributions.jl It does some additional work to ensure that types are consisent.

EDIT: We actually very recently made some changes to the “compiler” used in Turing.jl which allows us to introduce a @observe macro that would give the user a way of “forcing” an observation, irregardless of whether or not it’s in the arguments of the model, e.g. you could do

@observe q50 ~ Normal(dist, 0.50)

Is this something that would be desirable? If there’s interest in someting like that, we might just implement it:)

jdossgollin · April 18, 2021, 3:13am

Dear Tor,

Thanks very much for the quick response on a Saturday night! I’m more familiar with stan, which you have to tell explicitly which variables are observed or parameters, so I really appreciate the clarification. It looks like the DynamicPPL.@addlogprob! function is quite similar to stan’s target +=. Can I assume that Turing takes care of any needed chain rule/Jacobian adjustments? (Certainly my toy case here is working like this).

Thanks also for the two tips. I’ll keep those both in mind!

I think that the @observe macro would be quite interesting. I don’t know what other use cases might be, but being able to encode prior distributions on quantities that are derived from parameters – rather than on the parameters themselves – would certainly be useful to me.

For the sake of anyone looking through this in the future, this works:

using Turing, Distributions, DynamicHMC
using DynamicPPL: @addlogprob!


@model function Demo()

    # define the parameters as vaguely as possible
    μ ~ Normal(0, 100)
    σ ~ truncated(Normal(0, 10), 0, Inf)

    # we're going to use this distribution repeatedly
    dist = Normal(μ, σ)

    # define quantiles -- these are deterministic
    q50 = quantile(dist, 0.50)
    q90 = quantile(dist, 0.90)

    # prior distribution on quantiles - pretend this is "from experts"
    DynamicPPL.@addlogprob! logpdf(Normal(3, 0.1), q50) # our prior on the median is about 5
    DynamicPPL.@addlogprob! logpdf(Normal(10, 0.1), q90) # our prior on the 90th percentile is about 10
end

prior = sample(Demo(), DynamicNUTS(), 100_000)
yhat_prior = rand.(Normal.(prior[:μ], prior[:σ]))[:]
quantile(yhat_prior, 0.50) # 2.9986086587820795
quantile(yhat_prior, 0.90) # 9.982226223019403

but this formulation does not

@model function Demo(y)

    # define the parameters as vaguely as possible
    μ ~ Normal(0, 100)
    σ ~ truncated(Normal(0, 10), 0, Inf)

    # we're going to use this distribution repeatedly
    dist = Normal(μ, σ)

    # define quantiles -- these are deterministic
    q50 = quantile(dist, 0.50)
    q90 = quantile(dist, 0.90)

    # prior distribution on quantiles - pretend this is "from experts"
    DynamicPPL.@addlogprob! logpdf(Normal(3, 0.1), q50) # our prior on the median is about 5
    DynamicPPL.@addlogprob! logpdf(Normal(10, 0.1), q90) # our prior on the 90th percentile is about 10

    y .~ dist
end

y = [0]
prior = sample(Demo(y), Prior(), 100_000)
yhat_prior = rand.(Normal.(prior[:μ], prior[:σ]))[:]
quantile(yhat_prior, 0.50)
quantile(yhat_prior, 0.90)

which (I assume?) has to do with Turing not treating the DynamicPPL.@addlogprob! as part of the prior.

torfjelde · April 19, 2021, 2:28pm

Glad that it helped:)

Can I assume that Turing takes care of any needed chain rule/Jacobian adjustments? (Certainly my toy case here is working like this).

In general yes, but not in the case when you do addlogprob!. When you use addlogprob!, Turing assumes the user knows what they’re doing and leaves the RHS as is. So μ and σ will internally be transformed and the logjoint will be adjusted accordingly, but in the model-body you will only see μ and σ in their original space despite the sampler working with the unconstrained variables, e.g. HMC. But “conditionally deterministic” variables such as q50 and q90 the sampler doesn’t care about; all that matters is that these affect the logjoint. And so there’s no need to transform q50 and q90 to unconstrained space, and thus no need to deal with jacobian adjustments to the jointdistribution.

Does this make sense?

TL;DR: jacobian adjustments are performed automatically for latent variables when the sampler requires it, e.g. HMC, but the sampler doens’t care about conditionally deterministic variables and so AFAIK there should never be a case where this is something to think about when using addlogprob!.

which (I assume?) has to do with Turing not treating the DynamicPPL.@addlogprob! as part of the prior.

What do you mean “doesn’t work” here? You get different quantiles from the above model which is supposed to be the prior?

If so, yeah that won’t work. Prior is a simple sampler that samples from the right-hand sides of ~ statements. As soon as you want adjust the logdensity using some other quantities, e.g. your q50 and q90, then your back to the issue that MCMC attempts to solve:) But it’s a good point though, and maybe we should add an example in our docs talking a bit more about this.

But it’s actually possible to sample from the model prior using MCMC methods, though it’s a bit intricate. This is essentially writing the sample from AbstractMCMC.jl by hand (see the README for a bit more information) + using some convenience methods provided by DynamicPPL.jl.

julia> using Turing

julia> rng = Random.MersenneTwister(42);

julia> y = [0.0];

julia> m = Demo(y);

julia> alg = NUTS();

julia> # `spl` is a wrapper from DynamicPPL.jl that inherits a bunch of convenient impls.
       spl = DynamicPPL.Sampler(alg);

julia> spl_initial = DynamicPPL.initialsampler(spl) # Sampler used to initialize `spl`
DynamicPPL.SampleFromUniform()

julia> # Get the initial state using `spl_initial` and specify that we're only interested in
       # the prior by providing the `PriorContext()`.
       initial_state = DynamicPPL.VarInfo(rng, m, spl_initial, DynamicPPL.PriorContext())

julia> # Get the first `transition` and `state` for `spl`.
       transition, state = DynamicPPL.initialstep(rng, m, spl, var_info);

julia> # Set up a container for the samples using `transition` as the "template" for how samples look.
       samples = AbstractMCMC.samples(transition, m, spl)
Turing.Inference.HMCTransition{NamedTuple{(:μ, :σ), Tuple{Tuple{Vector{Float64}, Vector{String}}, Tuple{Vector{Float64}, Vector{String}}}}, NamedTuple{(:n_steps, :is_accept, :acceptance_rate, :log_density, :hamiltonian_energy, :hamiltonian_energy_error, :max_hamiltonian_energy_error, :tree_depth, :numerical_error, :step_size, :nom_step_size), Tuple{Int64, Bool, Float64, Float64, Float64, Float64, Float64, Int64, Bool, Float64, Float64}}, Float64}[]

julia> # Sample! Notice that we update the `state` at every iteration and pass the previous `state` to `step`.
       for i = 1:10_000
           sample, state = AbstractMCMC.step(rng, m, spl, state)
           AbstractMCMC.save!!(samples, sample, i, m, spl)
       end

julia> # Since `samples` is a simple array of samples we instead bundle those into a `MCMCChains.Chains`.
       prior = AbstractMCMC.bundle_samples(samples, m, spl, state, MCMCChains.Chains)
Chains MCMC chain (10000×14×1 Array{Float64, 3}):

Iterations        = 1:10000
Thinning interval = 1
Chains            = 1
Samples per chain = 10000
parameters        = μ, σ
internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth

Summary Statistics
  parameters      mean       std   naive_se      mcse         ess      rhat 
      Symbol   Float64   Float64    Float64   Float64     Float64   Float64 

           μ    2.9987    0.0997     0.0010    0.0023   1699.0814    1.0013
           σ    5.4623    0.1076     0.0011    0.0019   2807.4221    1.0005

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

           μ    2.7983    2.9320    3.0014    3.0693    3.1890
           σ    5.2567    5.3890    5.4608    5.5351    5.6749

julia> yhat_prior = rand.(Normal.(prior[:μ], prior[:σ]))[:];

julia> quantile(yhat_prior, 0.50) # ✓
3.018120334070474

julia> quantile(yhat_prior, 0.90) # ✓
10.157027708693064

This is really touching internals though, so not certain if you want to rely on this. This is as much for me as for you, as I wanted to see if we could provide this in a neat format Thinking that maybe we should provide a convenient way of saying “only sample the prior, but sample it using alg instead of Prior()”.

EDIT: I made a issue regarding this btw: Sample from prior using inference algorithm · Issue #1591 · TuringLang/Turing.jl · GitHub

jdossgollin · April 21, 2021, 2:27am

(RE Jacobian) yes, this makes sense - thanks.
(RE doesn’t work) – yes, that’s what I mean. Again, it makes sense that Turing doesn’t think the @addlogprob! lines are part of “prior”.
(RE implementation) looks interesting! Glad to be of service testing things out on my real data which is (only slightly) more complicated than this contrived Normal example.

Topic		Replies	Views
Custom Likelihood Distribution/increment likelihood Probabilistic Programming	7	1101	August 7, 2020
Custom likelihoods in Turing.jl General Usage	15	3788	October 26, 2018
Priors for generated quantities in Turing? Probabilistic Programming question	7	2702	April 18, 2021
Turing.jl: fix some parameters to prior distribution Probabilistic Programming	14	2193	June 10, 2020
Bayesian logistic regression with Turing.jl Probabilistic Programming turing , monte-carlo	29	4577	May 18, 2021

Turing.jl: prior on quantiles

Related topics