Plotting NUTS trace

Hi all

I am running the following model for Bayesian updating:

using Turing, Distributions
using StatsPlots

(Script is reading the data here)

@model function BayesLinearRegression(x, y)
    intercept ~ filldist(Normal(0, 1), n_sites)
    slope ~ filldist(Normal(0, 1), n_sites)
    sigma = zeros(n_sites)
    sigma ~ filldist(Beta(1, 5), n_sites)
    y_hat = intercept[idx] .+ slope[idx] .* x
    y ~ MvNormal(y_hat, sigma[idx])
end
model = BayesLinearRegression(x, y)
chain = sample(model, NUTS(0.65), 3_000)

Then, I am visualizing the trace with:

plot(chain)

However, in the traceplot I only get one trace for each of the variables, while I should be getting n_sites traces for each.

Could you please explain how to fix the trace visualization?

I think there are some mistakes in your hierarchical model.
Where is idx?

Try to see some examples here: Multilevel Models (a.k.a. Hierarchical Models)

Thanks for your feedback. I have restructured the Bayesian model as follows:

@model function BayesLinearRegression(x, y, idx)
    n_sites=maximum(idx)
    intercept ~ filldist(Normal(0, 1), n_sites)
    slope ~ filldist(Normal(0, 1), n_sites)
    sigma ~ filldist(Beta(1, 5), n_sites)
    y_hat = intercept[idx] .+ slope[idx] .* x
    y ~ MvNormal(y_hat, sigma[idx])
end

Now, the plot doesn’t even show up in VS code’s plotting pane. What do you think is the problem?

This is hard. Are you on windows? Have you tried running on a terminal instead of vs code?

I have no problems getting a plot when I use your model. When you get the plot, you should have one trace for each parameter–that is, one traceplot for intercept[1], intercept[2], intercept[3], and so on. You shouldn’t see them all in the same traceplot.

Depending on how many parameters you have, the plot might appear tiny in the vs code window. You may have to zoom in to see it.

To see if it works with you data, I suggest trying with a small number of groups. Maybe subset 4 or 5 groups and see what happens.

To build on something Storopoli said: Based on your code, it looks like you are conducting n_sites numbers of regressions. That may be what you want (or not), but I thought I’d point it out.

Here’s my code that worked fine:

using CSV, DataFrames, RDatasets
using Turing
using Plots, StatsPlots


# Load the oft-used sleepstudy dataset
df = dataset("lme4", "sleepstudy") |> DataFrame

# prepare groups
subject_map = Dict(key => idx for (idx, key) in enumerate(unique(df.Subject)))
df.Subject_id = [subject_map[sj] for sj in df.Subject]

@model function BayesLinearRegression(x, y, idx)
    n_sites=maximum(idx)
    intercept ~ filldist(Normal(0, 1), n_sites)
    slope ~ filldist(Normal(0, 1), n_sites)
    sigma ~ filldist(Beta(1, 5), n_sites)
    y_hat = intercept[idx] .+ slope[idx] .* x
    y ~ MvNormal(y_hat, sigma[idx])
end

model = BayesLinearRegression(df.Reaction, df.Days, df.Subject_id)
chn = sample(model, NUTS(500, 0.65), 500)

plot(chn)

Thanks a lot for your help. It seems that I have to “search” the plotting pane for the plot, just because of the large number of subplots included.

By the way, at this stage I wanted to run n_sites individual regressions. Thanks for keeping an eye out for this.