# Turing: Posterior Prediction

Hi,

I’ve been trying to use the `predict` function in `Turing` for Posterior Prediction as per here: https://discourse.julialang.org/t/ann-turing-0-13-0-mle-map-and-prediction/40033#prediction-2

Based on that, I tried applying the same syntax but on this example: https://storopoli.io/Bayesian-Julia/pages/11_multilevel_models/#random-intercept_model

``````using Pkg
Pkg.activate(".")

using Turing
using StatsBase
using LinearAlgebra
using Statistics: mean, std
using Random: seed!
seed!(123)

using DataFrames
using CSV
using HTTP

url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/cheese.csv"
cheese = CSV.read(HTTP.get(url).body, DataFrame)
describe(cheese)
for c in unique(cheese[:, :cheese])
cheese[:, "cheese_\$c"] = ifelse.(cheese[:, :cheese] .== c, 1, 0)
end

cheese[:, :background_int] = map(cheese[:, :background]) do b
if b == "rural"
1
elseif b == "urban"
2
else
missing
end
end

first(cheese, 5)

@model function varying_intercept(
X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)
)
#priors
α ~ Normal(mean(y), 2.5 * std(y))       # population-level intercept
β ~ filldist(Normal(0, 2), predictors)  # population-level coefficients
σ ~ Exponential(std(y))                 # residual SD
#prior for variance of random intercepts
#usually requires thoughtful specification
τ ~ truncated(Cauchy(0, 2); lower=0)    # group-level SDs intercepts
αⱼ ~ filldist(Normal(0, τ), n_gr)       # group-level intercepts

#likelihood
ŷ = α .+ X * β .+ αⱼ[idx]
return y ~ MvNormal(ŷ, σ^2 * I)
end;

X = Matrix(select(cheese, Between(:cheese_A, :cheese_D)));
y = cheese[:, :y];
idx = cheese[:, :background_int];

model_intercept = varying_intercept(X, idx, y)
chain_intercept = sample(model_intercept, NUTS(), MCMCThreads(), 1_000, 4)
println(DataFrame(summarystats(chain_intercept)))

pred_model = varying_intercept(X, idx, Vector{Union{Missing, Float64}}(undef, length(y)) )
predictions = StatsBase.predict(pred_model, chain_intercept)

# c = sample(pred_model,  NUTS(), 1_000)

``````

Essentially I tried using the `predict` function on the same dataset but with `Missing` y-values just to get a taste on the syntax, but I’ve got the following error:

``````ERROR: MethodError: no method matching Normal(::Missing, ::Missing)
``````

I’ve seen other posts raising issues with the `MvNormal` function: https://discourse.julialang.org/t/turing-jl-how-to-generate-samples-observations-from-a-model/95258/2, but that was some time ago, and thought that might have been addressed.

I’ve also tried following the Guide: https://turinglang.org/dev/docs/using-turing/guide/#treating-observations-as-random-variables, and used the `sample` function instead (un-commenting out the final line `c = sample(pred_model, NUTS(), 1_000)` and commenting out the `StatsBase.predict` line in the block above)

Any hints as to how I can get the posterior prediction to work here?

Solved…

1. Replaced the `MvNormal` with a for loop:
``````for i in 1:length(y)
y[i] ~ Normal(ŷ[i], σ^2)
end
``````
1. Removed the prior being reliant on the `mean(y)`… can’t initialize if the `y` response is `Missing`

Point 1 does mean a slight (c.20%) slow down in the code though.

Tried using `y .~ Normal.(ŷ, fill(σ^2, length(y)))` and it’s even slower still.

Also tried this `y ~ arraydist([Normal(ŷ[i], σ^2) for i in 1:length(y)])`, no luck with the `predict` function either with `MethodError: no method matching loglikelihood(::Product{Continuous, Normal{Float64}, Vector{Normal{Float64}}}, ::Vector{Union{Missing, Float64}})`

Any suggestions on how to speed it up further whilst getting it to work with the `predict` function?

Did you try with `filldist` ?

``````z ~ filldist(Normal(sigma^2), length(y))
y = yhat + z # or maybe y .= yhat .+z will be better ? dunno
``````

take a look at performance tips there : https://turing.ml/dev/docs/using-turing/performancetips

So this is quite a common issue unfortunately. A bit uncertain how we can best communicate this. Maybe it’s worth adding a FAQ on the website or something.

But regarding the issue: you want to make the thing on the LHS of a `~` evaluate to `missing` to make an “observation” become something to be sampled. In your case, you have `y ~ MvNormal(...)`, so you need to make `y === missing`, i.e. you need to pass in `y=missing` rather than `y=fill(missing, ...)`.

Of course, as you observed, when you do that, `mean(y)` won’t work so you have to pass this in as an argument:)

Does this help?

Thanks for looking at this! Unfortunately I’m still quite new to Julia and Turing and not quite sure what you mean around needing to make `y === missing`.

Would you mind modifying my example code in the original post so I can follow a bit better?

Edit: as an aside, I’ve updated the `predict` line to this:

`pred_model = varying_intercept(X, idx, similar(y, Missing) )` instead of the `Vector{Union{Missing, Float64}}(undef, length(y))`… seems cleaner

Effectively, when you see / write something like

``````y ~ MvNormal(...)
``````

in a Turing.jl `@model`, this is converted into something that looks like the following:

``````# If `y` is in the function arguments, we only treat it as a random variable if it is `missing`.
if :y in model_args && y === missing
# Treat as random variable, e.g. sample its value.
# ...
else
# Treat as observation.
# ...
end
``````

Similarly, if you have the line

``````y[1] ~ Normal(...)
``````

then this is converted into

``````if Symbol("y[1]") in model_args && y[1] === missing
# Treat as random variable, e.g. sample its value.
# ...
else
# Treat as observation.
# ...
end
``````

So here you see that in the first example `y` is checked if it’s the same as `missing`, while in the second example `y[1]` is checked. Hence, in the first example, if we have `y = [missing]` the then check `y === missing` evaluates to `false` => `y` is treated as an observation, while `y[1] === missing` (the case of the 2nd example) does evaluate to `true` => `y` is sampled.

Does that help?

And to clarify the usage of `===`: `x === y` is the same as `x == y` but only evaluates to `true` if `x` and `y` are identical “in all ways”, while `x == y` can evaluate to `true` if, say, the values of two arrays are the same even though the underlying memory is not. In the above scenario `==` would also be valid, but `===` allows the Julia compiler to make further optimizations.

EDIT: Note that the above is not actually what the macro does (we don’t use `Symbol` but our own structure called `VarName`, etc.), but it’s more-or-less accurate.

1 Like

Brilliant - many thanks! That explanation of what the macro is converting to makes a lot of sense.

In the end it’s as simple as:

``````pred_model = varying_intercept(X, idx, missing)
``````

1 Like