WAIC, or LOO etc in Turing?

Hello everyone,

I was wondering if any of the popular model selection criteria have been implemented in Turing? for example WAIC or LOO, etc? I am surprised that I can’t find mention of these in this forum or in the docs!

Thank you!

1 Like

Turing’s support for model comparison could use some extending. If you’re interested in opening a PR over in the MCMCChains repo (which we would think of as being the place to store model comparison stuff) I would be happy to help out.

I know that Seth Axen recently extended Python’s excellent arviz package to work with Turing (and all MCMCChains compatible chain objects). arviz supports WAIC and some other stuff. The package is still a WIP so I don’t know how much functionality there is on the statistics side.

1 Like

I have LOO in my package feel free to copy pasta it, but really you could write a cleaner version in probably 10 minutes(I know I could)

edit - Ew python! Someone just write it in Julia!

I’ve been taking a look at @sethaxen 's work and it looks amazing (on behalf of Bayesians everywhere: thank you Seth! )

I see that ArviZ does indeed wrap some of the functions I was thinking of (waic() and loo() for example) It seems that I am missing a step.

when I follow the example from the ArviZ readme, and then try to run waic(data) I get an error saying there is no log likelihood:

using ArviZ, PyPlot, Turing
ArviZ.use_style(["default", "arviz-darkgrid"])

# Turing model
@model school8(J, y, sigma) = begin
    mu ~ Normal(0, 5)
    tau ~ Truncated(Cauchy(0, 5), 0, Inf)
    theta ~ Normal(mu, tau)
    for j = 1:J
        y[j] ~ Normal(theta, sigma[j])
    end
end

J = 8
y = [28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0]
sigma = [15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0]
model_fun = school8(J, y, sigma)
sampler = NUTS(0.8)
chn = mapreduce(c -> sample(model_fun, sampler, 1000), chainscat, 1:4) # 4 chains

data = convert_to_inference_data(chn) # convert MCMCChains.Chains to InferenceData
summary(data) # show summary statistics

waic(data)
loo(data)

specifically the error says data must include log_likelihood in sample_stats.

I’d really appreciate any advice!

For LOO and WAIC to work, you need to save the log likelihood values of your individual samples. I don’t know Turing, but in Stan you would put something like this in the generated quantities block:

vector[J] log_likelihood;
for (j in 1:J) {
    log_likelihood[j] = normal_lpdf(y[j] | theta, sigma[j]);
}

I would also suggest using LOO instead of WAIC as it has better diagnostics.

2 Likes

Thanks for the kind words and for using ArviZ.jl! It now supports all of ArviZ’s API and as of yesterday has some usage docs.

@daniel is right about log_likelihood being a generated quantity in Stan. Turing doesn’t have this yet, though from what I understand, the Turing devs are working on it.

We had a discussion on the Julia slack workspace last week, and Jonas Mackerodt (@Jonas ?) shared this trick for getting the log likelihood and predictive values that I haven’t been able to try in detail yet. Perhaps it will be useful

using Turing

@model school8(J, y, sigma, gen=false) = begin
    mu ~ Normal(0, 5)
    tau ~ Truncated(Cauchy(0, 5), 0, Inf)
    eta = tzeros(J)
    eta ~ [Normal(0.0, 1.0)]
    theta = mu .+ tau .* eta
    if gen == true
        y_hat = Vector{Real}(undef, J)
        log_lik = Vector{Real}(undef, J)
        for j = 1:J
            dist = Normal(theta[j], sigma[j])
            y[j] ~ dist
            y_hat[j] = rand(dist)
            log_lik[j] = logpdf(dist, y[j])
        end
        return (mu=mu, tau=tau, eta=eta, theta=theta, y_hat=y_hat, log_lik=log_lik)
    end
end

@model school8_mv(J, y, sigma, gen=false) = begin
    mu ~ Normal(0, 5)
    tau ~ Truncated(Cauchy(0, 5), 0, Inf)
    #eta = tzeros(J)
    eta ~ MvNormal(fill(0.0,J), fill(1.0,J))
    theta = mu .+ tau .* eta
    if gen == true
        y_hat = Vector{Real}(undef, J)
        log_lik = Vector{Real}(undef, J)
        for j = 1:J
            dist = Normal(theta[j], sigma[j])
            y[j] ~ dist
            y_hat[j] = rand(dist)
            log_lik[j] = logpdf(dist, y[j])
        end
        return (mu=mu, tau=tau, eta=eta, theta=theta, y_hat=y_hat, log_lik=log_lik)
    end
end

J = 8;
y = [28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0];
sigma = [15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0];

model_fun = school8(J, y, sigma);
model_fun_mv = school8_mv(J, y, sigma);
#model_fun() # correctly draws a NamedTuple

chns = sample(model_fun, NUTS(2000,0.8), 4000) # lacks theta, y_hat, and log_lik
chns_mv = sample(model_fun_mv, NUTS(2000,0.8), 4000) # lacks theta, y_hat, and log_lik

mixeddensity(chns,[:mu, :tau])
mixeddensity!(chns_mv,[:mu, :tau])

function get_nlogp(model,sm)
    # Set up the model call, sample from the prior.
    vi = Turing.VarInfo(model)
    # Define a function to optimize.
    function nlogp(sm)
        spl = Turing.SampleFromPrior()
        new_vi = Turing.VarInfo(vi, spl, sm)
        a=model(new_vi, spl)
        return(a)
    end
    return nlogp(sm)
end

model_fun_ppc = school8(J, y, sigma, true)
model_fun_mv_ppc = school8_mv(J, y, sigma, true)

mu_post = Array(chns[:mu])
tau_post = Array(chns[:tau])
eta_post = Array(chns[:eta])

mu_post_mv = Array(chns_mv[:mu])
tau_post_mv = Array(chns_mv[:tau])
eta_post_mv = Array(chns_mv[:eta])

get_nlogp(model_fun_ppc,vcat(mu_post[1],tau_post[1],eta_post[1,:]))
get_nlogp(model_fun_mv_ppc,vcat(mu_post_mv[1],tau_post_mv[1],eta_post_mv[1,:]))

thanks Seth for a swift reply! I’m very happy to know how to get generated quantities out of a Turing model!

I wonder if obtaining the log-likelihood is only half the battle. I would much prefer to not have to create my own LOO-PSIS function, and rely instead upon ArviZ.

I’ve been trying this using the Stan interface. But to my surprise, even when I change the name of the generated quantity to log_likelihood (which ArviZ seems to expect), loo and waic seem not to detect it

using CmdStan
using ArviZ

set_cmdstan_home!(homedir() * "/cmdstan/")

J = 8
y = [28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0]
sigma = [15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0]
schools = [
    "Choate",
    "Deerfield",
    "Phillips Andover",
    "Phillips Exeter",
    "Hotchkiss",
    "Lawrenceville",
    "St. Paul's",
    "Mt. Hermon"
];

schools_code = """
data {
  int<lower=0> J;
  real y[J];
  real<lower=0> sigma[J];
}

parameters {
  real mu;
  real<lower=0> tau;
  real theta[J];
}

model {
  mu ~ normal(0, 5);
  tau ~ cauchy(0, 5);
  theta ~ normal(mu, tau);
  y ~ normal(theta, sigma);
}

generated quantities {
    vector[J] log_likelihood;
    vector[J] y_hat;
    for (j in 1:J) {
        log_likelihood[j] = normal_lpdf(y[j] | theta[j], sigma[j]);
        y_hat[j] = normal_rng(theta[j], sigma[j]);
    }
}
"""

schools_dat = Dict("J" => J, "y" => y, "sigma" => sigma)
stan_model = Stanmodel(
    model = schools_code,
    nchains = 4,
    num_warmup = 1000,
    num_samples = 1000,
)
_, stan_chns, _ = stan(stan_model, schools_dat, summary = false);

stan_infdata = convert_to_inference_data(stan_chns)

waic(stan_infdata)

This last line gives the same error as previously. Is there something obvious that I’m not getting?

You’re quite welcome!

convert_to_inference_data is a pretty feature-sparse conversion of a data type to an InferenceData. In this case, it takes the data in stan_chns and converts it. Stan’s output doesn’t differentiate between generated quantities and parameters and moreover between log likelihood and posterior predictions. If you want generated quantities to be treated specially in ArviZ, you’ll want to use a specialized conversion function.

In ArviZ, these start with from_. from_mcmcchains or from_cmdstan will work for you. Check out the worked example using CmdStan on 8 schools in the quickstart for the signature.

1 Like