[Turing.jl] How to generate samples ("observations") from a model

Let say I define a Turing model and would like to use it to generate not only from prior but also some associated y observations. Here is an example:

@model function mymodel(y::AbstractVector{F}) where {F<:Real}
  T = length(y)
  θ ~ Uniform(0,1)
  y ~ MvNormal(θ * Diagonal(ones(T)))
  return y # BTW I am not sure if this last line is useful

If I do rand(mymodel(y)) for some

y = randn(10) # for rand(mymodel) y values do no matter I believe

I get a sample for θ from its prior (uniform here).
However, here I would like to get an associated realization for y i.e., y_realization = MvNormal(θ * Diagonal(ones(T))).

Of course, I hand made code the sampler function for y but it seems a bit redundant since it is already in my model.
I tried looking into various function of DynamicPPL.jl but I cannot make sense how to extract and use the
y ~ MvNormal(θ * Diagonal(ones(T))) line. If this function already exists, I missed it.

@model function mymodel2(y::AbstractVector{F}) where {F<:Real}
  T = length(y)
  θ ~ Uniform(0,1)
  y ~ MvNormal(θ * Diagonal(ones(T)))
  return rand(MvNormal(θ * Diagonal(ones(T))))


t = rand(mymodel(rand(10)))[1]
param = (; θ = t)
generated_quantities(mymodel2(y), param)

does the job.

However, it is not satisfying:

  • When doing inference e.g. chain2 = sample(mymodel2(y), MH(), 10000), the rand inside the model is useless and called every time the model is called.
  • This is probably inefficient if ones wants to use generated_quantities N-time compared to rand(MvNormal(θ * Diagonal(ones(T))), N).

If I’m reading this correctly what you want is the prior predictive distribution of your model.
Turing has the predict function you can use to get the prior predictive distribution for y.

In practice I’ve found this to be a bit hard to use - in fact I couldn’t get your model to work with the MvNormal but had to rewrite the distribtion as a loop.

How it is used:

First define the model:

@model function mymodel(y)
    T = length(y)
    θ ~ Uniform(0,1)

    for n in eachindex(y)
        y[n] ~ Normal(0, θ)

Again, I’ve rewritten your y ~ ... as a for loop.
Then instantiate the model and sample from the prior for θ.

mod = mymodel(rand(100))
chn = sample(mod, Prior(), 1000)

Next, use the predict function to get y values for the samples for θ in chn.

mod_prior_pred = mymodel(Vector{Union{Missing, Float64}}(undef, 100))
prior_pred = predict(mod_prior_pred, chn)

The first step creates a new model with a missing vector as y.
These will be predicted by predict.

Note that this approach also works to get the posterior predictive distribution or predict new response values given some predictors (as in regression).
In this case we do not sample with Prior() but use an MCMC algorithm like NUTS() to get the posterior distribution. The rest of the example will be identical.

Thank you, I did not know about predict. This led me to the Turing guide where they also sample from Prior using missing.
It seems faster to use predict though than the method in the guide.

However, and as you said, it is not very convenient not to be able to use MvNormal because (for reference)

ERROR: MethodError: no method matching loglikelihood(::ZeroMeanDiagNormal{Tuple{Base.OneTo{Int64}}}, ::Vector{Union{Missing, Float64}})

the loglikelihood for MvNormal is not defined for missing types.

I found this hack that seems to be working with the MvNormal version (but not the loop version!)

# my model
@model function mymodelMv(y::AbstractVector)
  T = length(y)
  θ ~ Uniform(0,1)
  y ~ MvNormal(Diagonal(θ*ones(T)) )

# same but with no observation y
@model function mymodelMv(T::Int)
    θ ~ Uniform(0,1)
    y ~ MvNormal(Diagonal(θ*ones(T)) )

function mysampler(model, N)
    y = model.args[:y]
    n = length(y)
    # model with no observations
    mod_prior_pred = model.f(n)
    # sample from Prior
    chn = sample(model, Prior(), N)

    # predict "missing" y with the chain of θ 
    prior_pred = predict(mod_prior_pred, chn)
    θ = chn.value[:,1]
    y = prior_pred.value
    return y, θ

Is there better suggestions working (efficiently) for generic Turing models of the function mysampler(model, N)?

This seems to be a known issue, see Sampling from prior using return value doesn't work with MvNormal · Issue #1559 · TuringLang/Turing.jl · GitHub.
I just tested it again and the example works if you pass a single missing value for y. Thinking about it, it also makes sense, since y = missing is treated as a single multivariate missing observation.

You can then write,

@model function mymodel(y, l = ismissing(y) ? 1 : length(y))
    θ ~ Uniform(0,1)
    y ~ MvNormal(zeros(l), θ)

mod = mymodel(rand(10))
chn = sample(mod, Prior(), 1000)

mod_prior_pred = mymodel(missing, 10)
prior_pred = predict(mod_prior_pred, chn)

which is similar to your example, but

  1. As described above, y is not a vector of missing observations, but a single vector-variate missing observation,
  2. The second parameter l describing the dimension of the multivariate observation. This is important, because if y is missing, nothing is known about the dimensionality and length(missing) is not defined. So it defaults to 1 if y is missing, but to length(y) if y is non-missing.

For example to draw a 10-dimensional y observation:

mod_prior_pred = mymodel(missing, 10)
prior_pred = predict(mod_prior_pred, chn)

Thanks, this works.

However, I wonder why this is much longer than just doing

function handmodel(N, l)
    θ = zeros(N)
    y = zeros(N, l)
    for i in 1:N
        θ[i] = rand(Uniform(0,1))
        y[i,:] = rand(MvNormal(zeros(l), θ[i]))
    return θ, y
N = 1000
l = 10
@btime begin 
    chn = sample(mod, Prior(), N)
    mod_prior_pred = mymodel(missing, l)
    prior_pred = predict(mod_prior_pred, chn)
# 17.800 ms (392324 allocations: 21.96 MiB)
@btime handmodel(N, l)
# 122.900 μs (2004 allocations: 367.39 KiB)

Is it because how Turing stores a lot of (useless in this case) info along the chain? It seems like a lot for just sampling from prior. Or is there an option I do not see?

Ideally, I would like to have an “efficient” sampler like handmodel using only rand where the input is Turing.jl model (because syntax is nice :slight_smile: and if I want to use other sampling methods like NUTS(), MH() I can easily switch.

Hence, I believe the task is to extract the distributions and conditional structure (I did not succeed).

I am no expert on the Turing internals, but I think a good portion of the runtime - at least for this use case - is spent on constructing the output object. So if efficiency is an important criterion for you it is probably best to write your on prior predictive distribution just like you did with handmodel.