Translating a TuringGLM model to a Turing model

I am trying to implement a hierarchical model, and am running into issues getting the model to return the parameters I used to generate a test data set. When I implement the model using TuringGLM I get the parameters I expect, but I think I must be doing something wrong when I try to set up the model by hand, rather than relying on a model formula. My TuringGLM formula is response ~ treatment * species + (1|genotype). Response is a measured value (rather than a count), treatment consists of three categories, species consists of four categories, and genotype consists of three categories. Technically, each genotype is specific to the species, so could be nested, but I think the model works out the same either way.

The model I have tried to set up is:

@model function randomintercept_regression(y,X,idx; n_gr=length(unique(idx)), predictors=size(X, 2))
  #priors, depends on data that has been rescaled
  Ξ± ~ Normal(0, 10) # population-level intercept
  Ξ² ~ filldist(Normal(0,10), predictors)
  Οƒ ~ Exponential(10) # residual SD
  #prior for variance of random intercepts
  #usually requires thoughtful specification
  Ο„ ~ truncated(Cauchy(0, 10); lower=0) #group level SDs
  Ξ±β±Ό ~ filldist(Normal(0, Ο„), n_gr) # group level intercepts
  #likelihood
  Ε· = Ξ± .+ X * Ξ² .+ Ξ±β±Ό[idx] 
  y ~ MvNormal(Ε·, Οƒ^2 * I)
end

where idx is the list of genotypes coded as integers, X is a matrix of the predictor variables for each individual in the study (coded as 1 and 0, with each column representing a different level of either treatment, species, or the interaction between the two) and y represents the measured response of the individual. The fact that my TuringGLM model returns the correct parameter estimates suggests to me that it isn’t an issue with the structure of my data, rather how the model is set up. I have all the code associated with this project here, including code for data generation, should you want to do a really deep dive. Much of the code is adapted from this resource.

Thanks for your help!

I don’t think you are miss-specifying your Turing.jl model.
TuringGLM.jl uses different priors as default priors, which takes into account the underlying data, that might be the culprit here.

Can you post the estimates TuringGLM.jl versus the Turing.jl model you’ve posted?

One other interesting idea would be to normalize all you data X and response y using ZScoreTransform (take a look here: Bayesian-Statistics/02-linear_regression-kidiq.jl at main Β· storopoli/Bayesian-Statistics Β· GitHub) and proceed with Normal(0, 2.5) priors for Ξ² and Ξ±, and Exponential(1) for Οƒ.

1 Like

Sure thing. Here are the estimates for the Turing.jl model:

22Γ—8 DataFrame
 Row β”‚ parameters        mean          std        naive_se    mcse        ess        rhat     ess_per_sec 
     β”‚ Symbol            Float64       Float64    Float64     Float64     Float64    Float64  Float64     
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚ Ξ±                 -3.45131      5.76711    0.0911859   0.252182     230.766   1.02397    1.63194
   2 β”‚ treat1_Control     0.813562     4.82649    0.0763135   0.211874     255.586   1.01681    1.80746
   3 β”‚ treat2_Triticale  -1.7054       4.33897    0.0686052   0.193385     365.632   1.02321    2.58569
   4 β”‚ treat2_Wheat      -1.96195      4.34805    0.0687488   0.197807     318.722   1.027      2.25395
   5 β”‚ treat2_Barley      2.43193      4.34733    0.0687373   0.197601     320.813   1.02696    2.26874
   6 β”‚ treat2_Oats       -2.23604      4.35545    0.0688658   0.19943      308.04    1.02776    2.17841
   7 β”‚ treat1_MeJA       -3.31079      4.82681    0.0763186   0.211433     256.962   1.01672    1.81719
   8 β”‚ treat1_Insect     -3.10834      4.82561    0.0762996   0.211556     257.02    1.01669    1.8176
   9 β”‚ Οƒ                  1.07758      0.0523837  0.00082826  0.00407205    20.7791  1.28952    0.146946
  10 β”‚ Ο„                  0.152454     0.139013   0.002198    0.0155327     11.6907  1.71629    0.0826748
  11 β”‚ Ξ±β±Ό[1]             -0.000636618  0.124812   0.00197345  0.00809698    52.625   1.08083    0.372155
  12 β”‚ Ξ±β±Ό[2]             -0.0461119    0.107025   0.00169221  0.00487228    83.2762  1.05705    0.588916
  13 β”‚ Ξ±β±Ό[3]              0.0772738    0.152267   0.00240756  0.0137612     16.1005  1.41414    0.11386
  14 β”‚ Ξ±β±Ό[4]             -0.0550869    0.149021   0.00235623  0.0141615     14.2971  1.51724    0.101107
  15 β”‚ Ξ±β±Ό[5]             -0.0663971    0.170312   0.00269288  0.016902      13.7304  1.55641    0.0970988
  16 β”‚ Ξ±β±Ό[6]              0.0555591    0.125597   0.00198586  0.00879491    30.5549  1.17043    0.216079
  17 β”‚ Ξ±β±Ό[7]              0.0147474    0.11526    0.00182243  0.0077317     53.4461  1.08289    0.377962
  18 β”‚ Ξ±β±Ό[8]             -0.0221508    0.161494   0.00255344  0.0157733     14.5065  1.49708    0.102588
  19 β”‚ Ξ±β±Ό[9]             -0.00909111   0.099925   0.00157995  0.00275826  1996.17    1.00353   14.1166
  20 β”‚ Ξ±β±Ό[10]            -0.0792182    0.156167   0.00246922  0.0152744     14.0818  1.49238    0.0995839
  21 β”‚ Ξ±β±Ό[11]             0.00108857   0.197207   0.00311812  0.0201408     15.2257  1.44926    0.107674
  22 β”‚ Ξ±β±Ό[12]            -0.0331863    0.1292     0.00204284  0.0103636     19.8629  1.29653    0.140467

And here are the estimates for the TuringGLM.jl model:

20Γ—8 DataFrame
 Row β”‚ parameters  mean        std        naive_se     mcse         ess       rhat      ess_per_sec 
     β”‚ Symbol      Float64     Float64    Float64      Float64      Float64   Float64   Float64     
─────┼──────────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚ Ξ±           -0.30134    0.185182   0.00292799   0.00596053   1051.16   1.00139       32.5437
   2 β”‚ Ξ²[1]        -4.62083    0.216361   0.00342096   0.00692687   1030.89   1.00269       31.9162
   3 β”‚ Ξ²[2]        -4.0301     0.226085   0.00357472   0.00873237    682.446  1.00343       21.1284
   4 β”‚ Ξ²[3]        -4.32156    0.223349   0.00353146   0.00798252    881.987  1.00341       27.3061
   5 β”‚ Ξ²[4]        -3.88406    0.162535   0.00256991   0.00304312   3126.17   0.99978       96.7853
   6 β”‚ Ξ²[5]        -4.08865    0.164911   0.00260747   0.00377165   2972.37   1.00138       92.0239
   7 β”‚ Οƒ            1.05227    0.0476444  0.000753325  0.000765722  4253.41   1.0003       131.684
   8 β”‚ Ο„            0.102047   0.0878335  0.00138877   0.00380113    529.434  1.0037        16.3911
   9 β”‚ zβ±Ό[1]       -0.0284821  0.919094   0.0145322    0.0130367    4590.23   0.999867     142.112
  10 β”‚ zβ±Ό[2]       -0.283288   0.981594   0.0155204    0.0151085    3973.92   0.999478     123.031
  11 β”‚ zβ±Ό[3]        0.187311   0.937938   0.0148301    0.0128819    4123.02   1.00007      127.648
  12 β”‚ zβ±Ό[4]       -0.117119   0.958306   0.0151521    0.0140993    4466.12   0.999459     138.27
  13 β”‚ zβ±Ό[5]       -0.222428   0.966174   0.0152765    0.0150198    3850.42   1.00036      119.208
  14 β”‚ zβ±Ό[6]        0.22366    0.952422   0.0150591    0.0138048    4170.4    0.999588     129.115
  15 β”‚ zβ±Ό[7]        0.121774   0.956345   0.0151211    0.0188772    3589.15   0.999184     111.119
  16 β”‚ zβ±Ό[8]        0.13658    0.942066   0.0148954    0.0166025    3821.37   1.0002       118.309
  17 β”‚ zβ±Ό[9]       -0.0184732  0.939687   0.0148578    0.0169186    3493.39   0.999854     108.154
  18 β”‚ zβ±Ό[10]      -0.236547   0.923681   0.0146047    0.0138926    4463.18   0.999422     138.179
  19 β”‚ zβ±Ό[11]       0.151902   0.969382   0.0153273    0.0145629    4167.17   1.00028      129.014
  20 β”‚ zβ±Ό[12]      -0.0126014  0.933396   0.0147583    0.0146358    4166.88   0.999517     129.006

Based on my sleuthing it looks like the TuringGLM model is incorporating the Barley and Control as part of the intercept, and then doing comparative estimates. But I back-calculated them at the math was pretty close.

For reference the parameters used in the data generation were:

19Γ—2 DataFrame
 Row β”‚ parameter   coeff     
     β”‚ String15    Float64   
─────┼───────────────────────
   1 β”‚ Control      0.0
   2 β”‚ Insect      -4.10827
   3 β”‚ MeJA        -4.17167
   4 β”‚ Barley       0.0
   5 β”‚ Oats        -4.97833
   6 β”‚ Triticale   -4.11681
   7 β”‚ Wheat       -4.51245
   8 β”‚ Barley1      0.634788
   9 β”‚ Barley2      0.731347
  10 β”‚ Barley3      0.620631
  11 β”‚ Oats1        0.262506
  12 β”‚ Oats2        0.480827
  13 β”‚ Oats3        0.681654
  14 β”‚ Triticale1   0.956313
  15 β”‚ Triticale2   0.206814
  16 β”‚ Triticale3   0.603935
  17 β”‚ Wheat1       0.786783
  18 β”‚ Wheat2       0.605044
  19 β”‚ Wheat3       0.166197

How would normalization work with predictors that are purely 1 or 0? I looked into normalization, but wasnt sure if it would make a difference if the variables were dummy coded.

I tried again, this time adjusting the X matrix to exclude Barley and Control so they would be incorporated into the intercept. The new parameter estimates are:

20Γ—8 DataFrame
 Row β”‚ parameters  mean          std        naive_se     mcse        ess       rhat      ess_per_sec 
     β”‚ Symbol      Float64       Float64    Float64      Float64     Float64   Float64   Float64     
─────┼───────────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚ Ξ±           -0.218272     0.176584   0.00279204   0.00410713  1599.31   1.00187      18.9299
   2 β”‚ Ξ²[1]        -4.10688      0.221082   0.00349562   0.00534045  1838.74   1.00322      21.7639
   3 β”‚ Ξ²[2]        -4.40813      0.216661   0.00342572   0.00576301  1662.02   1.00416      19.6721
   4 β”‚ Ξ²[3]        -4.69131      0.21747    0.0034385    0.00515242  1814.77   1.00265      21.4801
   5 β”‚ Ξ²[4]        -4.13625      0.166991   0.00264036   0.00626846   322.938  1.01617       3.82238
   6 β”‚ Ξ²[5]        -3.92708      0.161498   0.00255351   0.0046016   1530.07   1.00321      18.1104
   7 β”‚ Οƒ            1.05389      0.0513702  0.000812234  0.0023493    120.992  1.02733       1.4321
   8 β”‚ Ο„            0.111642     0.0884269  0.00139815   0.00555255   241.995  1.01163       2.86432
   9 β”‚ Ξ±β±Ό[1]       -0.000398632  0.115827   0.00183138   0.00189341  3743.89   1.0005       44.3137
  10 β”‚ Ξ±β±Ό[2]       -0.0353279    0.126083   0.00199355   0.00288435  1894.11   1.00021      22.4192
  11 β”‚ Ξ±β±Ό[3]        0.0359535    0.124245   0.00196448   0.00310668  1929.02   0.999885     22.8324
  12 β”‚ Ξ±β±Ό[4]       -0.00971979   0.114741   0.00181422   0.00177179  2993.38   0.99945      35.4304
  13 β”‚ Ξ±β±Ό[5]       -0.0272379    0.126527   0.00200056   0.00251181  2555.22   1.00055      30.2444
  14 β”‚ Ξ±β±Ό[6]        0.0423152    0.126711   0.00200348   0.00344255  1478.26   1.00011      17.4971
  15 β”‚ Ξ±β±Ό[7]        0.00848105   0.114706   0.00181365   0.0021953   3374.93   0.999696     39.9467
  16 β”‚ Ξ±β±Ό[8]        0.0131566    0.116819   0.00184707   0.00195295  2963.07   0.999978     35.0717
  17 β”‚ Ξ±β±Ό[9]       -0.0131799    0.116765   0.00184621   0.00276529  2158.81   1.00227      25.5522
  18 β”‚ Ξ±β±Ό[10]      -0.0341412    0.119848   0.00189497   0.00311182  1546.84   1.00051      18.3088
  19 β”‚ Ξ±β±Ό[11]       0.0250325    0.116745   0.0018459    0.00275691  2733.87   1.00108      32.3589
  20 β”‚ Ξ±β±Ό[12]       0.000353223  0.116472   0.00184158   0.00219533  3462.41   0.999878     40.9821

Which are much closer to the TuringGLM model, but have some slight differences.

The number of parameters are different. I think that TuringGLM is coding the categorical species using K-1 categories, try to use the same categorical coding for the plain Turing data.