Turing model producing unexpected summary statistics

I have a very simple Turing model where all variables are Normal or TruncatedNormal. For some reason, the summary statistics of the model are not the ones I expected. Can you please help with this MWE?

using Turing
using Distributions
using LinearAlgebra
using Random

@model function Model(X, Y, params)
  # size of design matrix
  n, p = size(X)

  # hyperparameters
  σF = params.σF
  σP = params.σP
  σM = params.σM
  τF = params.τF
  τP = params.τP
  τM = params.τM
  cF = params.cF
  cP = params.cP
  cM = params.cM

  αF ~ Normal(cF, σF)
  αP ~ Normal(cP, σP)
  αM ~ Normal(cM, σM)
  
  βF ~ MvNormal(τF^2 * I(p))
  βP ~ MvNormal(τP^2 * I(p))
  βM ~ MvNormal(τM^2 * I(p))

  for i in 1:n
    x = view(X, i, :)

    NF = Normal(αF + βF⋅x, σF)
    NP = Normal(αP + βP⋅x, σP)
    NM = Normal(αM + βM⋅x, σM)

    Y[i, 1] ~ Truncated(NF, 1000.0, 4000.0)
    Y[i, 2] ~ Truncated(NP, 0.0, Y[i, 1])
    Y[i, 3] ~ Truncated(NM, 0.2, 1.8)
  end
end

# ------------
# MAIN SCRIPT
# ------------

Random.seed!(2021)

X = randn(100, 10)

Y = [2200 80 1] .+ randn(100, 3)

params = (
  σF=80.0, σP=3.0, σM=0.07,
  τF=80.0, τP=3.0, τM=0.07,
  cF=2365.67, cP=81.86, cM=0.93,
  A=106.0
)

model = Model(X, Y, params)

prior = sample(model, Prior(), MCMCThreads(), 125, 8)

The summary statistics say that the betas have non-zero mean:

Summary Statistics
  parameters        mean       std   naive_se      mcse         ess      rhat   ess_per_sec 
      Symbol     Float64   Float64    Float64   Float64     Float64   Float64       Float64 

          αF   2367.5469   77.2003     2.4413    1.9845   1050.1537    0.9952      252.4408
          αP     81.8590    3.0893     0.0977    0.1060    970.1233    0.9982      233.2027
          αM      0.9303    0.0677     0.0021    0.0025    943.5514    1.0003      226.8152
       βF[1]      1.6034   80.3508     2.5409    2.2824   1023.2686    0.9990      245.9780
       βF[2]     -3.7451   82.0924     2.5960    2.5802    961.3972    1.0008      231.1051
       βF[3]      1.1207   80.0799     2.5323    2.7361   1018.4947    0.9976      244.8304
       βF[4]     -1.6076   78.7029     2.4888    2.3270   1082.2488    0.9969      260.1560
       βF[5]     -2.7930   79.1165     2.5019    2.5547    893.2253    0.9989      214.7176
       βF[6]     -2.7102   79.5150     2.5145    1.8956    986.7812    0.9955      237.2070
       βF[7]     -2.7330   80.4840     2.5451    2.4614   1180.3496    1.0000      283.7379
       βF[8]     -0.6590   79.4045     2.5110    2.6443   1043.9858    1.0049      250.9581
       βF[9]     -3.1116   82.5449     2.6103    2.4207   1066.4970    0.9977      256.3695
      ⋮            ⋮          ⋮         ⋮          ⋮          ⋮          ⋮           ⋮
                                                                              21 rows omitted

Am I missing something? I’ve been debugging this code for hours and can’t find out why it is not producing betas with zero mean.

In this case, I think there are two reasons. The first reason is because you have set the prior for your βF coefficients to have a mean 0 and sd of 80. So in this case being off by a mean of -3.7 means that your estimated mean is off from the true mean by 1/25th of a standard deviation, which is pretty small.

I suggest using a tighter prior. It should make a difference. If you look at βP or βM, for example, they should be pretty close to zero, given their priors.

The second reason is that you are drawing too few samples given the size of your sd. You have roughly 1000 draws, which in this case is going to result in the beta’s being away from 0. Set it your samples to 12500 (e.g., prior = sample(model, Prior(), MCMCThreads(), 12500, 8) and the means will be much closer to zero. They won’t be zero, but they’ll be much closer.

I hope that helps.

1 Like

Thank you @Stephen_Wild , appreciate your help.

1 Like