Compare two Turing models with LOO: SE is NaN

DominiqueMakowski · July 30, 2023, 2:30pm

I am trying to compare two SequentialSamplingModels with loo using ParetoSmooth.jl, but I think I am doing something wrong:

using Turing
using SequentialSamplingModels
using Random
using Distributions
using DataFrames
using StatsPlots
using StatsModels
using StatsBase
using ParetoSmooth

# Generate data (1000 obs)
Random.seed!(6)

dist = LBA(ν=[3.0, 2.0], A=0.8, k=0.2, τ=0.3)
data = rand(dist, 1000)


# ---------------------
# Models
@model function model_lba(data; min_rt=minimum(data.rt))
    ν ~ filldist(Normal(0, 1), 2)
    A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
    k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
    τ ~ Uniform(0.0, min_rt)

    data ~ LBA(; ν, A, k, τ)
end
chain_lba = sample(model_lba(data), NUTS(), 1000)


@model function model_lnr(data; min_rt=minimum(data.rt))
    ν ~ filldist(Normal(0, 1), 2)
    σ ~ truncated(Normal(0, 1), 0.0, Inf)
    τ ~ Uniform(0.0, min_rt)

    data ~ LNR(; ν, σ, τ)
end
chain_lnr = sample(model_lnr(data), NUTS(), 1000)

When I run the following:

rez1 = psis_loo(model_lba(data), chain_lba)
rez2 = psis_loo(model_lnr(data), chain_lnr)
loo_compare((lba=rez1, lnr=rez2))

While the comparison works, the psis_loo() function returns

┌ Warning: Some Pareto k values are extremely high (>1). PSIS will not produce consistent estimates.
└ @ ParetoSmooth C:\Users\domma\.julia\packages\ParetoSmooth\Ml7Gb\src\InternalHelpers.jl:47
Results of PSIS-LOO-CV with 1000 Monte Carlo samples and 1 data points. Total Monte Carlo SE of NaN.
┌───────────┬────────┬──────────┬────────┬─────────┐
│           │  total │ se_total │   mean │ se_mean │
├───────────┼────────┼──────────┼────────┼─────────┤
│   cv_elpd │ 452.85 │      NaN │ 452.85 │     NaN │
│ naive_lpd │ 456.28 │      NaN │ 456.28 │     NaN │
│     p_eff │   3.43 │      NaN │   3.43 │     NaN │

I am not sure where it gets the “1 data points” from, as there are 1000 observations

Moreover, in this post, Aki mentions that it is useful to compare the ELPD relative to their SEs to have an idea of the magnitude of the difference. Hence I am wondering if this info (or some standardized difference) is available or can be computed? Thanks for any tips for model comparison!

Christopher_Fisher · July 30, 2023, 2:59pm

I think the problem is that pointwise_log_likelihoods does not compute pointwise correctly for this type of model. The following should be 1000X1000X1 I believe

pointwise_log_likelihoods(model_lba(data), chain_lba)

I’ll continue digging to see where the error occurs.

Christopher_Fisher · July 30, 2023, 4:20pm

I thought the issue was that nsamples was not defined. Adding that definition didn’t fix the problem. I’m not sure where in Turing the data size is computed.

DominiqueMakowski · July 30, 2023, 5:01pm

It works if we use the list of tuples specification:

@model function model_lba(data; min_rt=0.2)
    # Priors
    ν ~ filldist(Normal(0, 1), 2)
    A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
    k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
    τ ~ Uniform(0.0, min_rt)

    # Likelihood
    for i in 1:length(data)
        data[i] ~ LBA(; ν, A, k, τ)
    end
end

dat = [(choice=data.choice[i], rt=data.rt[i]) for i in 1:length(data.rt)]
chain_lba = sample(model_lba(dat, min_rt=minimum(data.rt)), NUTS(), 1000)

rez1 = psis_loo(model_lba(dat, min_rt=minimum(data.rt)), chain_lba)

[ Info: No source provided for samples; variables are assumed to be from a Markov Chain. If the samples are independent, specify this with keyword argument `source=:other`.
Results of PSIS-LOO-CV with 1000 Monte Carlo samples and 1000 data points. Total Monte Carlo SE of 0.084.
┌───────────┬────────┬──────────┬───────┬─────────┐
│           │  total │ se_total │  mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│   cv_elpd │ 453.32 │    27.43 │  0.45 │    0.03 │
│ naive_lpd │ 457.57 │    27.26 │  0.46 │    0.03 │
│     p_eff │   4.25 │     0.30 │  0.00 │    0.00 │

Christopher_Fisher · July 30, 2023, 5:42pm

That makes sense. Thanks for reporting. My guess is that length or something similar is called to extract the length of the vector.

Christopher_Fisher · July 30, 2023, 7:57pm

I thought maybe the problem is that SequentialSamplingModels was not complying with the interface, but the problem occurs with other models. Consider the following:

using Turing
using ParetoSmooth
using Distributions

@model function model(data)
    μ ~ Normal()
    data ~ Normal(μ, 1)
end

data = rand(Normal(0, 1), 100)

chain = sample(model(data), NUTS(), 1000)
rez1 = psis_loo(model(data), chain)

Replacing the vectorized form with a for loop fixes the problem. I wonder whether the Turing should destructure the arrays to compute pointwise correctly. Of course, that may lead to other problems.

Christopher_Fisher · August 2, 2023, 10:02am

Using a for loop is currently the proper approach for using LOO with ParetoSmooth.jl. I will make a PR to explain that in the documentation. There might be a plan to improve the interface at some point in the future. Another alternative is Arviz.jl.

Topic		Replies	Views
How to use ParetoSmooth.jl Probabilistic Programming turing , paretosmooth	12	3134	November 19, 2022
Loo cv to Bayesian Estimation of Differential Equations (challenge) General Usage turing , paretosmooth	11	460	January 11, 2023
Bayesian model comparison / selection Statistics question , turing , bayesian-inference	3	1241	May 3, 2022
ParetoSmooth.jl psis_loo crashes with ERROR: The number of lines in `row_names` must match the number of lines in the matrix Probabilistic Programming question	5	1284	November 24, 2022
WAIC, or LOO etc in Turing? Probabilistic Programming	11	4504	October 1, 2020

Compare two Turing models with LOO: SE is NaN

Related topics