How to use ParetoSmooth.jl

mthelm85 · April 14, 2022, 2:40pm

It looks like ParetoSmooth.jl is fairly new so the documentation is lacking examples right now. I’m wondering if someone with experience using the package can assist. Here’s a simple MWE:

using ParetoSmooth
using StatsBase
using Turing

X = rand(1.0:100.0, 30)
Y = @. 3.0 + 2.0 * X + rand(Normal(0, 10))

@model function test(X, Y, σᵦ)
    σ ~ Exponential(1)
    α ~ Normal(0, 1)
    β ~ Normal(0, σᵦ)
    μ = @. α + β * X
    Y ~ MvNormal(μ, σ)
end

Xₜ = standardize(ZScoreTransform, X)
Yₜ = standardize(ZScoreTransform, Y)

ch = sample(test(Xₜ, Yₜ, 1), IS(), 1_000) # Assuming I need to use IS() here

So I have my MCMC chain stored as ch. Now what? I’d like to calculate, for example, the leave-one-out cross validation score via psis_loo (documentation for which is here). How do I get the log_likelihood matrix?

sethaxen · April 19, 2022, 8:47am

LOO is completely independent of the inference method, so no need to use IS() here. In fact, IS() seems to have some problems, so let’s use NUTS() instead.

You can retrieve the likelihood matrix either with Turing.pointwise_loglikelihoods or with ParetoSmooth.pointwise_log_likelihoods, which formats the matrix as needed by ParetoSmooth. But there’s also a convenience function that computes the log-likelihoods for you:

julia> ch = sample(test(Xₜ, Yₜ, 1), NUTS(), MCMCThreads(), 1_000, 4);
┌ Info: Found initial step size
└   ϵ = 0.21250000000000002
┌ Info: Found initial step size
└   ϵ = 0.30000000000000004
┌ Info: Found initial step size
└   ϵ = 0.225
┌ Info: Found initial step size
└   ϵ = 0.0125
Sampling (4 threads) 100%|█████████████████████████████████████████████████████████████████| Time: 0:00:00

julia> psis_loo(test(Xₜ, Yₜ, 1), ch)
[ Info: No source provided for samples; variables are assumed to be from a Markov Chain. If the samples are independent, specify this with keyword argument `source=:other`.
┌ Warning: Some Pareto k values are extremely high (>1). PSIS will not produce consistent estimates.
└ @ ParetoSmooth ~/.julia/packages/ParetoSmooth/A6x5U/src/InternalHelpers.jl:43
Results of PSIS-LOO-CV with 4000 Monte Carlo samples and 1 data points. Total Monte Carlo SE of 0.37.
┌───────────┬───────┬──────────┬───────┬─────────┐
│           │ total │ se_total │  mean │ se_mean │
├───────────┼───────┼──────────┼───────┼─────────┤
│   cv_elpd │  5.65 │      NaN │  5.65 │     NaN │
│ naive_lpd │  8.19 │      NaN │  8.19 │     NaN │
│     p_eff │  2.53 │      NaN │  2.53 │     NaN │
└───────────┴───────┴──────────┴───────┴─────────┘

mthelm85 · April 19, 2022, 2:54pm

Hi Seth,

Thanks so much for taking the time to help me out with this - I really appreciate it!

I have a remaining doubt though regarding the values in the result. Since we are estimating model error, I would expect smaller values to be better, but the output of the following example is the opposite of what I would expect:

using DataFrames
using MLBase
using ParetoSmooth
using StatsBase
using StatsPlots
using Turing

X = rand(1.0:100.0, 30)
# True data-generating model is second-order polynomial!
Y = @. 3.0 + 2.0 * X + 0.05 * X^2 + rand(Normal(0, 40))
Xₜ = StatsBase.standardize(ZScoreTransform, X)
Yₜ = StatsBase.standardize(ZScoreTransform, Y)

@model function test(X, Y, o) # 'o' is for 'order'
    σ ~ Exponential(1)
    α ~ Normal(0, 1)
    β ~ MvNormal(o, 1)
    μ = α .+ sum(β .* [X.^i for i in 1:o])
    Y ~ MvNormal(μ, σ)
end

# PSIS_LOO
score1 = psis_loo(test(Xₜ, Yₜ, 1), sample(test(Xₜ, Yₜ, 1), NUTS(), 1_000))
score2 = psis_loo(test(Xₜ, Yₜ, 2), sample(test(Xₜ, Yₜ, 2), NUTS(), 1_000))
score3 = psis_loo(test(Xₜ, Yₜ, 3), sample(test(Xₜ, Yₜ, 3), NUTS(), 1_000))
score4 = psis_loo(test(Xₜ, Yₜ, 4), sample(test(Xₜ, Yₜ, 4), NUTS(), 1_000))

plot(
    1:4,
    # get the :cv_elpd and :naive_lpd values for each model
    [[eval(Symbol("score$i")).estimates[1,1] for i in 1:4] [eval(Symbol("score$i")).estimates[2,1] for i in 1:4]],
    labels=["Out-of-Sample" "In-Sample"],
    legend=:topleft,
    marker=:o
)

Output:

psis_example1

If I roll my own LOOCV function and then plot, it looks like this (result is what I expect for this example):

data = DataFrame(X=Xₜ,Y=Yₜ)

# leave one out cross validation
function loocv(o)
    N = size(data, 1)
    SE = []
    for s in LOOCV(N)
        chain = DataFrame(sample(test(data.X[s], data.Y[s], o), NUTS(), 1_000))
        x = data.X[setdiff(1:N, s)][1]
        y = data.Y[setdiff(1:N, s)][1]
        ŷ = mean([row.α + sum([row[Symbol("β[$i]")] * x^i for i in 1:o]) for row in eachrow(chain)])
        push!(SE, (ŷ - y)^2)
    end
    return sum(SE)
end

order = 4
err = [loocv(o) for o in 1:order]
plot(1:order, err, legend=false, marker=:o)

Output:

loocv_example

Finally, if I multiply the results from PsisLoo by negative 1, I get something that looks like what I would expect:

plot(
    1:4,
    [(-1 .* [eval(Symbol("score$i")).estimates[1,1] for i in 1:4]) (-1 .* [eval(Symbol("score$i")).estimates[2,1] for i in 1:4])],
    labels=["Out-of-Sample" "In-Sample"],
    legend=:topleft,
    marker=:o
)

Output:

psis_example2

So, this was just a really long way to ask if (relative to other models) bigger values in :cv_elpd and :naive_lpd are better

sethaxen · April 19, 2022, 3:42pm

Ah, I see the source of confusion. ParetoSmooth gives the following descriptions:

These are somewhat misleading, because ELPD is more a measure of predictive model accuracy, not error, with a specific definition of accuracy. “lpd” stands for “log predictive density”, and “elpd” stands for “expected log predictive density”. Generally, we are interested in ELPD, which, if a data point i is left out, is the log-likelihood of that data point averaged over the posterior we would get conditioning only on the other data points. This is what LOO estimates, and we want this to be maximized.

So, high ELPD is better. And you can check yourself that LOO estimates this by fitting the LOO posteriors, holding out each i at a time and computing the expected log-likelihood for each one.

You should also look at p_eff (effective number of parameters), which should in general not be larger than the number of parameters in the model (if it is, that could indicate misspecification). So while the difference between the ELPDs when comparing 3rd-order polynomials and 2nd-order polynomials is not much (an ELPD difference of less than 4 is considered small), we also find that the p_eff increases from less than 3 to over 4.

mthelm85 · April 19, 2022, 3:51pm

Fantastic. This is amazing, thanks again!

sethaxen · April 19, 2022, 4:01pm

You’re welcome!

charperflow · June 1, 2022, 10:52pm

A bit late to the party here, but how would you get a meaningful comparison between different elpd loo scores for two different models fit to the same data? Especially with there being no standard error.

sethaxen · June 2, 2022, 7:15am

Section 5 of [1507.04544] Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC will likely be helpful for you. From the latter, the standard error of ELPD is computed treating the ELPD as a sum across data points, so in cases like this where there’s a single data point (i.e. the likelihood computed for each log density evaluation has a single term), the standard error will be 0 (ParetoSmooth returns it as NaN). So you can just compare the ELPDs of the two models with that in mind using the principles in Cross-validation FAQ • loo .

charperflow · June 2, 2022, 5:45pm

Oh perfect, thanks! That paper was exactly what I needed.

charperflow · June 2, 2022, 6:34pm

I’ll probably try to work out my own version of an loo elpd to make sure I completely understand it, but I have one initial question after a quick read of that paper. In this example our data has 30 independent observations. If the elpd_loo returned from psis_loo is defined like in the paper then is that value not the average of the \hat{elpd}_{loo,i} calculated for each i of those 30 data points? Should there not be a standard error like se(\hat{elpd_{loo}})=\sqrt{30V_{i=1}^{30}\hat{elpd_{loo,i}}}

sethaxen · June 2, 2022, 9:17pm

Yes, I believe that’s correct. Are you seeing something different in what ParetoSmooth returns?

SebastianCallh · November 17, 2022, 12:46pm

Hi, hope it is okay resurrecting this thread. I tried running the example above but it crashes with

[ Info: No source provided for samples; variables are assumed to be from a Markov Chain. If the samples are independent, specify this with keyword argument `source=:other`.
┌ Warning: Some Pareto k values are extremely high (>1). PSIS will not produce consistent estimates.
└ @ ParetoSmooth ~/.julia/packages/ParetoSmooth/A6x5U/src/InternalHelpers.jl:43
Results of PSIS-LOO-CV with 1000 Monte Carlo samples and 1 data points. Total Monte Carlo SE of 0.29.
Error showing value of type PsisLoo{Float64, Array{Float64, 3}, Vector{Float64}}:
ERROR: The number of lines in `row_names` must match the number of lines in the matrix.

Has something changed? I am running ParetoSmooth v0.7.1 and Turing 0.22.0.
I also tried the new/other Turing syntax where I wrote to model to condition on the observations psis_loo(test(Xₜ, 1) | (;Y), ch) but it gave the same error.

full stack trace

ERROR: The number of lines in row_names must match the number of lines in the matrix.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] _print_info(data::PrettyTables.ColumnTable; alignment::Symbol, cell_alignment::Nothing, cell_first_line_only::Bool, compact_printing::Bool, filters_row::Nothing, filters_col::Nothing, formatters::PrettyTables.var"#52#54"{Vector{String}}, header::Tuple{Vector{Symbol}}, header_alignment::Symbol, header_cell_alignment::Nothing, limit_printing::Bool, renderer::Symbol, row_names::Vector{Symbol}, row_name_alignment::Symbol, row_name_column_title::String, row_number_column_title::String, show_row_number::Bool, title::String, title_alignment::Symbol)
@ PrettyTables ~/.julia/packages/PrettyTables/nQZHQ/src/private.jl:220
[3] #_pt#81
@ ~/.julia/packages/PrettyTables/nQZHQ/src/private.jl:458 [inlined]
[4] #_pretty_table#80
@ ~/.julia/packages/PrettyTables/nQZHQ/src/private.jl:378 [inlined]
[5] #pretty_table#68
@ ~/.julia/packages/PrettyTables/nQZHQ/src/print.jl:706 [inlined]
[6] show(io::IOContext{Base.TTY}, #unused#::MIME{Symbol(“text/plain”)}, loo_object::PsisLoo{Float64, Array{Float64, 3}, Vector{Float64}})
@ ParetoSmooth ~/.julia/packages/ParetoSmooth/A6x5U/src/LeaveOneOut.jl:57
[7] (::REPL.var"#43#44"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol(“text/plain”)}, Base.RefValue{Any}})(io::Any)
@ REPL ~/software/julia/1.8.2/share/julia/stdlib/v1.8/REPL/src/REPL.jl:267
[8] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
@ REPL ~/software/julia/1.8.2/share/julia/stdlib/v1.8/REPL/src/REPL.jl:521
[9] display(d::REPL.REPLDisplay, mime::MIME{Symbol(“text/plain”)}, x::Any)
@ REPL ~/software/julia/1.8.2/share/julia/stdlib/v1.8/REPL/src/REPL.jl:260
[10] display(d::REPL.REPLDisplay, x::Any)
@ REPL ~/software/julia/1.8.2/share/julia/stdlib/v1.8/REPL/src/REPL.jl:272
[11] display(x::Any)
@ Base.Multimedia ./multimedia.jl:328
[12] #invokelatest#2
@ ./essentials.jl:729 [inlined]
[13] invokelatest
@ ./essentials.jl:726 [inlined]
[14] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
@ REPL ~/software/julia/1.8.2/share/julia/stdlib/v1.8/REPL/src/REPL.jl:296
[15] (::REPL.var"#45#46"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
@ REPL ~/software/julia/1.8.2/share/julia/stdlib/v1.8/REPL/src/REPL.jl:278
[16] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
@ REPL ~/software/julia/1.8.2/share/julia/stdlib/v1.8/REPL/src/REPL.jl:521
[17] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
@ REPL ~/software/julia/1.8.2/share/julia/stdlib/v1.8/REPL/src/REPL.jl:276
[18] (::REPL.var"#do_respond#66"{Bool, Bool, REPL.var"#77#87"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
@ REPL ~/software/julia/1.8.2/share/julia/stdlib/v1.8/REPL/src/REPL.jl:857
[19] #invokelatest#2
@ ./essentials.jl:729 [inlined]
[20] invokelatest
@ ./essentials.jl:726 [inlined]
[21] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
@ REPL.LineEdit ~/software/julia/1.8.2/share/julia/stdlib/v1.8/REPL/src/LineEdit.jl:2510
[22] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
@ REPL ~/software/julia/1.8.2/share/julia/stdlib/v1.8/REPL/src/REPL.jl:1248
[23] (::REPL.var"#49#54"{REPL.LineEditREPL, REPL.REPLBackendRef})()
@ REPL ./task.jl:484

ParadaCarleton · November 19, 2022, 1:34am

Hi, please ask about this in a new thread (and make sure to tag me!), to help users googling for an answer.

Topic		Replies	Views
Compare two Turing models with LOO: SE is NaN Probabilistic programming turing , bayesian-inference	6	403	August 2, 2023
Loo cv to Bayesian Estimation of Differential Equations (challenge) General Usage turing , paretosmooth	11	455	January 11, 2023
ParetoSmooth.jl psis_loo crashes with ERROR: The number of lines in `row_names` must match the number of lines in the matrix Probabilistic programming question	5	1283	November 24, 2022
Can't add ParetoSmooth to the current environment because ERROR: Unsatisfiable requirements New to Julia package , turing , paretosmooth	1	358	April 28, 2022
Multi-level varying slopes with two clusters, cross classification Probabilistic programming statistics , turing , distributions , bayesian-inference	16	2120	August 14, 2021

How to use ParetoSmooth.jl

Related topics