WAIC computation

I am trying to fill in some gaps left from trying implement model comparison of a model fitted in Turing. Following McElreath’s Statistical Rethinking (2nd ed) and getting help from Claude.ai, I have come up with a function that calculates lppds for each observation, given posterior samples in chains:

function my_waic(chains, X, Y)
    dfc = @chain DataFrame(chains) begin
        select([:iteration, :chain, :σ, :β₀, :β₁])
    end

    n = length(Y)
    lppds = Vector{Float64}(undef, n)
    pwaics = Vector{Float64}(undef, n)

    for (i, (x, y)) in zip(X, Y) |> enumerate
        mus = dfc[:, :β₀] + dfc[:, :β₁] .* x
        log_pds = map((m, s) -> logpdf(Normal(m, s), y), mus, dfc[:, :σ])

        # Calculate lppd
        lppds[i] = logsumexp(log_pds) - log(length(log_pds))
        
        # Calculate penalty term (effective number of parameters)
        pwaics[i] = var(log_pds)
    end
    return lppds, pwaics
end

I can then use it as follows to get total WAIC and its standard error:

lppds, pwaics = my_waic(chains, train_X, train_Y)
elpd = sum(lppds) - sum(pwaics)
WAIC = -2 * elpd
WAIC_SE = sqrt(length(lppds) * var(-2 .* (lppds - pwaics)))

Which gives me

368.9568650726459

19.188152888480644

I did all this to try to understand the logic and the algorithm behind WAIC-based model comparison. I was aiming to recreate whatever ArviZ.waic does, however, my results are only somewhat consistent with ArviZ’s:

ll = let
    plls = pointwise_loglikelihoods(model, chains)
    # Ensure the ordering of the loglikelihoods matches the ordering of `posterior_predictive`
    log_likelihood_y = getindex.(Ref(plls), string.(keys(posterior_predictive)))
    (; Y=cat(log_likelihood_y...; dims=3))
end

idata = ArviZ.from_mcmcchains(chains;
    log_likelihood = ll,
    library = "Turing",
)

waic(idata)

which gives me

WAICResult with estimates
     elpd  elpd_mcse    p  p_mcse 
 -1.8e+02        9.6  3.0    0.43

According to Claude.ai, I am on the right track since WAIC is -2 * elpd.

I wonder if I am indeed on the right track with my “manual” computation of WAIC and WAIC_SE.

I would also like to know why ArviZ’s version returns values that are so crudely rounded off?

Each of your questions pertains to functions defined in PosteriorStats, one of ArviZ’s component packages.

Your calculation of ELPD looks correct. Information criteria have been historically reported on multiple scales. Watanabe actually used the log-score (ELPD), and the PSIS-LOO papers also adopt this convention. See Cross-validation FAQ • loo for an explanation.

Note that PosteriorStats primarily treats waic and loo as methods for estimating elpd and p. If you want an information criterion on a specific scale, it includes the utility information_criterion that you can pass the result of waic/loo to to get the information criterion in your desired scale. As a final note, there are almost no cases where waic is better than loo, and it’s primarily included for comparison with loo.

When PosteriorStats returns an object with a custom show method that prints a table, it tries to both reduce visual clutter and avoid communicating estimates with higher precision than is warranted. Specifically, this means what whenever a standard error of an estimate is computed, that is heuristically used to remove low confidence significant digits. PosteriorStats is actually pretty conservative with its rounding and tends to display 1 or more digits more than are justified by the SE (e.g. in your above example, the result is likely in [-2.1e+02, -1.5e+02], so we can’t even state with high confidence what the first digit of the true ELPD is. Still, we show 2 digits.). The computed values themselves are not rounded and can be accessed in the fields of the object being shown.

1 Like