Turing: Posterior Prediction

Hi,

I’ve been trying to use the predict function in Turing for Posterior Prediction as per here: https://discourse.julialang.org/t/ann-turing-0-13-0-mle-map-and-prediction/40033#prediction-2

Based on that, I tried applying the same syntax but on this example: https://storopoli.io/Bayesian-Julia/pages/11_multilevel_models/#random-intercept_model

using Pkg
Pkg.activate(".")

using Turing
using StatsBase
using LinearAlgebra
using Statistics: mean, std
using Random: seed!
seed!(123)


using DataFrames
using CSV
using HTTP

url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/cheese.csv"
cheese = CSV.read(HTTP.get(url).body, DataFrame)
describe(cheese)
for c in unique(cheese[:, :cheese])
    cheese[:, "cheese_$c"] = ifelse.(cheese[:, :cheese] .== c, 1, 0)
end

cheese[:, :background_int] = map(cheese[:, :background]) do b
    if b == "rural"
        1
    elseif b == "urban"
        2
    else
        missing
    end
end

first(cheese, 5)

@model function varying_intercept(
    X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)
)
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))       # population-level intercept
    β ~ filldist(Normal(0, 2), predictors)  # population-level coefficients
    σ ~ Exponential(std(y))                 # residual SD
    #prior for variance of random intercepts
    #usually requires thoughtful specification
    τ ~ truncated(Cauchy(0, 2); lower=0)    # group-level SDs intercepts
    αⱼ ~ filldist(Normal(0, τ), n_gr)       # group-level intercepts

    #likelihood
    ŷ = α .+ X * β .+ αⱼ[idx]
    return y ~ MvNormal(ŷ, σ^2 * I)
end;

X = Matrix(select(cheese, Between(:cheese_A, :cheese_D)));
y = cheese[:, :y];
idx = cheese[:, :background_int];


model_intercept = varying_intercept(X, idx, y)
chain_intercept = sample(model_intercept, NUTS(), MCMCThreads(), 1_000, 4)
println(DataFrame(summarystats(chain_intercept)))


pred_model = varying_intercept(X, idx, Vector{Union{Missing, Float64}}(undef, length(y)) )
predictions = StatsBase.predict(pred_model, chain_intercept)

# c = sample(pred_model,  NUTS(), 1_000)

Essentially I tried using the predict function on the same dataset but with Missing y-values just to get a taste on the syntax, but I’ve got the following error:

ERROR: MethodError: no method matching Normal(::Missing, ::Missing)

I’ve seen other posts raising issues with the MvNormal function: https://discourse.julialang.org/t/turing-jl-how-to-generate-samples-observations-from-a-model/95258/2, but that was some time ago, and thought that might have been addressed.

I’ve also tried following the Guide: https://turinglang.org/dev/docs/using-turing/guide/#treating-observations-as-random-variables, and used the sample function instead (un-commenting out the final line c = sample(pred_model, NUTS(), 1_000) and commenting out the StatsBase.predict line in the block above)

Any hints as to how I can get the posterior prediction to work here?

Thanks in advance!

Solved…

  1. Replaced the MvNormal with a for loop:
for i in 1:length(y)
   y[i] ~ Normal(ŷ[i], σ^2)
end
  1. Removed the prior being reliant on the mean(y)… can’t initialize if the y response is Missing

Point 1 does mean a slight (c.20%) slow down in the code though.

Tried using y .~ Normal.(ŷ, fill(σ^2, length(y))) and it’s even slower still.

Also tried this y ~ arraydist([Normal(ŷ[i], σ^2) for i in 1:length(y)]), no luck with the predict function either with MethodError: no method matching loglikelihood(::Product{Continuous, Normal{Float64}, Vector{Normal{Float64}}}, ::Vector{Union{Missing, Float64}})

Any suggestions on how to speed it up further whilst getting it to work with the predict function?

Did you try with filldist ?

z ~ filldist(Normal(sigma^2), length(y))
y = yhat + z # or maybe y .= yhat .+z will be better ? dunno

take a look at performance tips there : https://turing.ml/dev/docs/using-turing/performancetips

So this is quite a common issue unfortunately. A bit uncertain how we can best communicate this. Maybe it’s worth adding a FAQ on the website or something.

But regarding the issue: you want to make the thing on the LHS of a ~ evaluate to missing to make an “observation” become something to be sampled. In your case, you have y ~ MvNormal(...), so you need to make y === missing, i.e. you need to pass in y=missing rather than y=fill(missing, ...).

Of course, as you observed, when you do that, mean(y) won’t work so you have to pass this in as an argument:)

Does this help?

Thanks for looking at this! Unfortunately I’m still quite new to Julia and Turing and not quite sure what you mean around needing to make y === missing.

Would you mind modifying my example code in the original post so I can follow a bit better?

Edit: as an aside, I’ve updated the predict line to this:

pred_model = varying_intercept(X, idx, similar(y, Missing) ) instead of the Vector{Union{Missing, Float64}}(undef, length(y))… seems cleaner

Effectively, when you see / write something like

y ~ MvNormal(...)

in a Turing.jl @model, this is converted into something that looks like the following:

# If `y` is in the function arguments, we only treat it as a random variable if it is `missing`.
if :y in model_args && y === missing
    # Treat as random variable, e.g. sample its value.
    # ...
else
    # Treat as observation.
    # ...
end

Similarly, if you have the line

y[1] ~ Normal(...)

then this is converted into

if Symbol("y[1]") in model_args && y[1] === missing
    # Treat as random variable, e.g. sample its value.
    # ...
else
    # Treat as observation.
    # ...
end

So here you see that in the first example y is checked if it’s the same as missing, while in the second example y[1] is checked. Hence, in the first example, if we have y = [missing] the then check y === missing evaluates to false => y is treated as an observation, while y[1] === missing (the case of the 2nd example) does evaluate to true => y is sampled.

Does that help?

And to clarify the usage of ===: x === y is the same as x == y but only evaluates to true if x and y are identical “in all ways”, while x == y can evaluate to true if, say, the values of two arrays are the same even though the underlying memory is not. In the above scenario == would also be valid, but === allows the Julia compiler to make further optimizations.

EDIT: Note that the above is not actually what the macro does (we don’t use Symbol but our own structure called VarName, etc.), but it’s more-or-less accurate.

1 Like

Brilliant - many thanks! That explanation of what the macro is converting to makes a lot of sense.

In the end it’s as simple as:

pred_model = varying_intercept(X, idx, missing)

:man_facepalming:

1 Like