WAIC computation

Each of your questions pertains to functions defined in PosteriorStats, one of ArviZ’s component packages.

Your calculation of ELPD looks correct. Information criteria have been historically reported on multiple scales. Watanabe actually used the log-score (ELPD), and the PSIS-LOO papers also adopt this convention. See Cross-validation FAQ • loo for an explanation.

Note that PosteriorStats primarily treats waic and loo as methods for estimating elpd and p. If you want an information criterion on a specific scale, it includes the utility information_criterion that you can pass the result of waic/loo to to get the information criterion in your desired scale. As a final note, there are almost no cases where waic is better than loo, and it’s primarily included for comparison with loo.

When PosteriorStats returns an object with a custom show method that prints a table, it tries to both reduce visual clutter and avoid communicating estimates with higher precision than is warranted. Specifically, this means what whenever a standard error of an estimate is computed, that is heuristically used to remove low confidence significant digits. PosteriorStats is actually pretty conservative with its rounding and tends to display 1 or more digits more than are justified by the SE (e.g. in your above example, the result is likely in [-2.1e+02, -1.5e+02], so we can’t even state with high confidence what the first digit of the true ELPD is. Still, we show 2 digits.). The computed values themselves are not rounded and can be accessed in the fields of the object being shown.

1 Like