BoundsError on basic Turing model

rikh · January 1, 2021, 6:33pm

I’m trying to fit a basic model in Turing. The data consists of some values y (lets say a height) for different groups/categories x. I am trying to learn whether the y-values per group differ, or are the same. To make it a simple and minimal working example, I’ve come up with 6 data points.

If I understand Statistical Rethinking correctly, then the advise for the dataset defined by df would be to use multiple levels so that we can leverage shrinkage. In this case, I’ve planned α to learn the general mean, and γ to learn something about each group.

using Distributions
using Random
using MCMCChains
using Statistics
using Turing

df = DataFrame(
    x = [1.0, 1.0, 2.0, 2.0, 3.0, 3.0],
    y = [90.0, 95.0, 100.0, 105.0, 110.0, 90.0]
)

@model function my_model(X, Y, n_groups)
    α ~ Normal(100, 10)
    β ~ Normal(0, 1)
    γ ~ filldist(Normal(0, 10), n_groups)
    σ ~ truncated(Cauchy(0, 2), 0, Inf)

    μ = α + β .* γ[X]
    Y .~ Normal.(μ, σ)
end

Random.seed!(0)
n_groups = length(unique(df.x))
chains = sample(my_model(df.x, df.y, n_groups), HMC(0.05, 10), 1000)

Unfortunately, I’ve tried many samplers and variations of this model, but it keeps giving errors. This one returns

ERROR: LoadError: BoundsError: attempt to access 2-element 
Array{Float64,1} at index [[1.0, 1.0, 2.0, 2.0, 3.0, 3.0]]

So, I see that γ[X] is wrong but not how to fix it. Anyone here who knows how to train a model like this?

I’m running Julia 1.5.3 with

  [a93c6f00] DataFrames v0.22.2
  [31c24e10] Distributions v0.23.12
  [c7f686f2] MCMCChains v4.4.0
  [fce5fe82] Turing v0.15.8

cpfiffer · January 1, 2021, 11:59pm

I’m really confused by γ[X] – what’s the intention of that? These are group means, presumably, and you’re trying to get the means out using group assignments x, is that right?

What you want is to get out those means and construct a new vector – probably something like map(x -> γ[i], X) instead.

rikh · January 2, 2021, 9:07am

I’m afraid that I didn’t understand your suggestion of constructing a new vector, but I did manage to fix the issue due to your first question! γ[X] was indeed weird.

@cpfiffer Thanks!

For the interested reader, now the model is

using DataFrames
using Distributions
using Random
using MCMCChains
using Statistics
using Turing

df = DataFrame(
    x = [1, 1, 2, 2, 3, 3],
    y = [90.0, 95.0, 100.0, 105.0, 110.0, 115.0]
)

@model function my_model(X, Y, n_groups)
    α ~ Normal(100, 10)
    σ ~ truncated(Cauchy(0, 2), 0, Inf)

    α_group ~ filldist(Normal(α, σ), n_groups)

    μ = α_group[X]
    Y .~ Normal.(μ, σ)
end

Random.seed!(0)
n_groups = length(unique(df.x))
chains = sample(my_model(df.x, df.y, n_groups), NUTS(0.65), 5000)

which is based on the multilevel model on page 403 of Statistical Rethinking 2nd Edition. The output is

┌ Info: Found initial step size
└   ϵ = 0.0125
Sampling 100%|██████████████████████████████████████████████| Time: 0:00:08
Chains MCMC chain (5000×17×1 Array{Float64,3}):

Iterations        = 1:5000
Thinning interval = 1
Chains            = 1
Samples per chain = 5000
parameters        = α, α_group[1], α_group[2], α_group[3], σ
internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, max_hamiltonian_energy_error, n_steps, nom_step_size, numerical_error, step_size, tree_depth

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64

           α   102.1064    4.1199     0.0583    0.0753   3118.7230    0.9998
  α_group[1]    95.6620    4.0696     0.0576    0.0602   4150.9502    0.9999
  α_group[2]   102.2216    4.0961     0.0579    0.0667   3918.5134    0.9998
  α_group[3]   109.0440    4.0813     0.0577    0.0644   3742.4896    0.9999
           σ     6.1596    2.2583     0.0319    0.0512   1708.5842    1.0006

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%      97.5%
      Symbol    Float64    Float64    Float64    Float64    Float64

           α    93.5765    99.5942   102.1675   104.7270   110.1281
  α_group[1]    87.5076    93.3065    95.7007    98.0714   103.3676
  α_group[2]    93.9497    99.8729   102.3474   104.6314   110.0684
  α_group[3]   100.8964   106.6675   109.0747   111.4941   116.8196
           σ     3.4933     4.6871     5.6461     7.0598    11.4700

Topic		Replies	Views
Specify a separate MvNormal for each observation of a variable in Turing Probabilistic Programming turing	2	837	November 13, 2020
Help with first non-trivial Turing example Statistics	17	1302	May 7, 2020
Complicated GMM in Turing.jl - StackOverFlow error Probabilistic Programming error , turing	4	794	May 11, 2021
Parameter estimation problem in Turing Probabilistic Programming turing	11	508	July 14, 2022
Translating a TuringGLM model to a Turing model Probabilistic Programming	4	414	December 25, 2022

BoundsError on basic Turing model

Related topics