How to estimate uncertainty in a LinearMixedModel?

jar1 · July 12, 2022, 7:49pm

In this example, I want to estimate the uncertainty around the beta value for each Cut & Clarity pair. I’m not sure how to interpret the allpars data. Is it producing a single sigma shared across all values of Cut & Clarity? Do I need to simulate in order to get what I want? I am not trying to get prediction error, I just want the uncertainty around the expected value at each point.

using RDatasets, MixedModels, DataFrames

d = dataset("ggplot2", "diamonds")
form = @formula(Price ~ (1|Cut & Clarity))
modelfit = fit(LinearMixedModel, form, d)
pb = parametricbootstrap(Random.GLOBAL_RNG, 1000, modelfit)
DataFrame(pb.allpars)

#= julia> DataFrame(pb.allpars)
3000×5 DataFrame
  Row │ iter   type    group          names        value    
      │ Int64  String  String?        String?      Float64  
──────┼─────────────────────────────────────────────────────
    1 │     1  β       missing        (Intercept)  4054.44
    2 │     1  σ       Cut & Clarity  (Intercept)   831.115
    3 │     1  σ       residual       missing      3907.46
    4 │     2  β       missing        (Intercept)  3925.81
    5 │     2  σ       Cut & Clarity  (Intercept)   726.458
    6 │     2  σ       residual       missing      3908.1
    7 │     3  β       missing        (Intercept)  3941.74
    8 │     3  σ       Cut & Clarity  (Intercept)   621.993
    9 │     3  σ       residual       missing      3918.18
   10 │     4  β       missing        (Intercept)  3987.62
   11 │     4  σ       Cut & Clarity  (Intercept)   811.536
   12 │     4  σ       residual       missing      3929.52
   13 │     5  β       missing        (Intercept)  3818.16
   14 │     5  σ       Cut & Clarity  (Intercept)   609.15
   15 │     5  σ       residual       missing      3883.62
   16 │     6  β       missing        (Intercept)  3786.48
   17 │     6  σ       Cut & Clarity  (Intercept)   850.474
  ⋮   │   ⋮      ⋮           ⋮             ⋮          ⋮
 2985 │   995  σ       residual       missing      3911.85
 2986 │   996  β       missing        (Intercept)  3802.83
 2987 │   996  σ       Cut & Clarity  (Intercept)   720.439
 2988 │   996  σ       residual       missing      3898.91
 2989 │   997  β       missing        (Intercept)  3782.61
 2990 │   997  σ       Cut & Clarity  (Intercept)   734.746
 2991 │   997  σ       residual       missing      3899.4
 2992 │   998  β       missing        (Intercept)  3692.06
 2993 │   998  σ       Cut & Clarity  (Intercept)   677.055
 2994 │   998  σ       residual       missing      3916.26
 2995 │   999  β       missing        (Intercept)  3856.67
 2996 │   999  σ       Cut & Clarity  (Intercept)   810.09
 2997 │   999  σ       residual       missing      3912.06
 2998 │  1000  β       missing        (Intercept)  3725.55
 2999 │  1000  σ       Cut & Clarity  (Intercept)   743.319
 3000 │  1000  σ       residual       missing      3906.36
                                           2966 rows omitted =#

palday · July 12, 2022, 8:39pm

If you’re just looking for confidence intervals, have you looked DataFrame(shortestcovint(pb))?

palday · July 12, 2022, 8:39pm

This is shown in the docs: Parametric bootstrap for mixed-effects models · MixedModels

jar1 · July 12, 2022, 8:51pm

That produces an interval for a single Cut & Clarity parameter, but I was looking for an interval for every value of Cut & Clarity. I want to make a coefficient plot, to show the credible ranges for each value of Cut and Clarity .

pbdf = DataFrame(pb.allpars)
combine(groupby(pbdf, [:type, :group, :names]), :value => shortestcovint => :interval)

#= 3×4 DataFrame
 Row │ type    group          names        interval           
     │ String  String?        String?      Tuple…             
─────┼────────────────────────────────────────────────────────
   1 │ β       missing        (Intercept)  (3600.75, 4104.61)
   2 │ σ       Cut & Clarity  (Intercept)  (552.975, 906.98)
   3 │ σ       residual       missing      (3892.35, 3939.2) =#

jar1 · July 12, 2022, 9:17pm

This might be what I want. Does it make sense?

using RDatasets, MixedModels, DataFrames

d = dataset("ggplot2", "diamonds")
form = @formula(Price ~ (1|Cut & Clarity))
modelfit = fit(LinearMixedModel, form, d)
pb = parametricbootstrap(Random.GLOBAL_RNG, 1000, modelfit)
pbdf = DataFrame(pb.allpars)
combine(groupby(pbdf, [:type, :group, :names]), :value => shortestcovint => :interval)

d[!,:sim] = simulate(modelfit)
let c = combine(groupby(d, [:Cut,:Clarity]), :sim .=> [mean, sem])
    layers = (data(c)
    * visual(Errorbars)
    * mapping(:Cut,:sim_mean,:sim_sem=>(x->2x);layout=:Clarity)
    )
draw(layers; axis=(;xticklabelrotation=π/8))
end

palday · July 13, 2022, 5:12pm

Without knowing your inferential and presentation goals and potentially more about your study design, I don’t know whether it makes sense for your purposes. I don’t think a mixed model is the way to analyze the diamonds data, but the presentation you give is a way to get regularized “estimates” (technically predictions) of all those things.

Have you looked at MixedModelsMakie? caterpillar and RanefInfo will display similar information using the conditional variances instead of the bootstrap replicates.

palday · July 13, 2022, 5:14pm

I think this is the core thing I’m getting stuck on – what do you mean with “beta”? Usually, “beta” refers to the fixed-effect regression coefficients. The conditional modes / best linear unbiased predictions (BLUPs) / random effects are something else.

jar1 · July 14, 2022, 2:48am

Ah yes, beta was not what I meant. I want to show my results in terms of the uncertainty around the expected outcome associated with each value of the categorical parameters. I find this outcome-scale presentation easier to interpret than raw parameter estimates, especially for generalized linear mixed models. It looks like MixedModelsMakie.caterpillar produces estimates in terms of the parameter values, rather than best predictions.

Based on

d[!,:sim] = simulate(modelfit)
let c = combine(groupby(d, [:Cut,:Clarity]), :sim .=> [mean, sem])
    layers = (data(c)
    * visual(Errorbars)
    * mapping(:Cut,:sim_mean,:sim_sem=>(x->2x);layout=:Clarity)
    )
draw(layers; axis=(;xticklabelrotation=π/8))
end

Running this code a few times, the predictions jump around quite a bit. E.g.

How should I get reliable estimates/predictions for each level of the categories?

palday · July 14, 2022, 2:14pm

simulate is only a single replication, so things will tend to change a fair amount for any parameter value. For the BLUPs though, they are completely new draws from the random effects distribution. This is one of the critical ways that random effects differ from fixed effects – you’re estimating the distribution of the random effects, and predicting the individual values. (This is also why the BLUPs aren’t part of the bootstrap summary – they aren’t parameters and the bootstrap summary only stores parameters.)

Those predictions are relative to the population-level values, i.e. centered at the corresponding fixed effects, so if you want to plot those for a simple intercepts-only model, you can do something like:

grp = Symbol("Cut & Clarity")
df = leftjoin(DataFrame(raneftables(modelfit)[grp]), DataFrame(condVartables(modelfit)[grp]); on=grp)
select!(df, 
        grp => ByRow(identity) => [:Cut, :Clarity], 
        "(Intercept)" => "mean", 
        :σ => ByRow(only) => "std")

layers = data(df) * visual(Errorbars) * 
    mapping(:Cut, :mean, :std => (x -> 1.96);layout=:Clarity)
draw(layers; axis=(; xticklabelrotation=π/8))

A similar result could also be achieved with MixedModelsMakie:

using CairoMakie, MixedModelsMakie
re = ranefinfo(modelfit, grp)
re.ranef[:, 1] .+= fixef(modelfit)[1] # 'absolute' prediction for each group instead of relative to the population effect
caterpillar!(Figure(), re)

I’m a little bit hesitant to provide any more concrete advice without knowing more about your inferential problem (and I probably don’t have time at the moment to help you with the details of your inferential problem), because this feels like an XY Problem.

Topic		Replies	Views
How to fit MixedModels with the option REML=TRUE? Statistics	5	1495	January 25, 2019
Residuals from LinearMixed Models (Julia 0.6.3) General Usage question	2	1214	July 2, 2018
Get siginificance of main fixed category effects in mixed linear model with MixedModels.jl Statistics	7	2061	October 10, 2017
Nonlinear least squares estimate New to Julia question	5	1540	February 22, 2018
MixedModels.jl: how to get confidence intervals? General Usage question	0	301	September 16, 2020

How to estimate uncertainty in a LinearMixedModel?

Related topics