I’m trying to fit a basic model in Turing. The data consists of some values y
(lets say a height) for different groups/categories x
. I am trying to learn whether the y-values per group differ, or are the same. To make it a simple and minimal working example, I’ve come up with 6 data points.
If I understand Statistical Rethinking correctly, then the advise for the dataset defined by df
would be to use multiple levels so that we can leverage shrinkage. In this case, I’ve planned α
to learn the general mean, and γ
to learn something about each group.
using Distributions
using Random
using MCMCChains
using Statistics
using Turing
df = DataFrame(
x = [1.0, 1.0, 2.0, 2.0, 3.0, 3.0],
y = [90.0, 95.0, 100.0, 105.0, 110.0, 90.0]
)
@model function my_model(X, Y, n_groups)
α ~ Normal(100, 10)
β ~ Normal(0, 1)
γ ~ filldist(Normal(0, 10), n_groups)
σ ~ truncated(Cauchy(0, 2), 0, Inf)
μ = α + β .* γ[X]
Y .~ Normal.(μ, σ)
end
Random.seed!(0)
n_groups = length(unique(df.x))
chains = sample(my_model(df.x, df.y, n_groups), HMC(0.05, 10), 1000)
Unfortunately, I’ve tried many samplers and variations of this model, but it keeps giving errors. This one returns
ERROR: LoadError: BoundsError: attempt to access 2-element
Array{Float64,1} at index [[1.0, 1.0, 2.0, 2.0, 3.0, 3.0]]
So, I see that γ[X]
is wrong but not how to fix it. Anyone here who knows how to train a model like this?
I’m running Julia 1.5.3 with
[a93c6f00] DataFrames v0.22.2
[31c24e10] Distributions v0.23.12
[c7f686f2] MCMCChains v4.4.0
[fce5fe82] Turing v0.15.8