How to obtain numeric arrays from a linear mixed model formula?

Hello. I have a data frame (“dat”, with variables x1, y, and id) and a formula (@formula(y ~ 1 + x1 + (1 | id))). I want to obtain the following numeric arrays:

  • y: response vector
  • X: design matrix for fixed effect, with two columns: intercept and x1
  • Z: design matrix for random effects: here just intercept
  • id: cluster identifier

I’m aware that the StatsModels.jl package allows me to parse a fixed effect model with functions such as “schema, apply_schema, modelcols”.

My question is how do I do it with mixed effect models? Thank you.

I think the easiest way to obtain these arrays is to create the LinearMixedModel object and extract the properties

julia> using MixedModels, RCall

julia> m1 = LinearMixedModel(@formula(height ~ 1 + age + (1|Subject)), rcopy(R"nlme::Oxboys"));

julia> propertynames(m1)
(:formula, :sqrtwts, :A, :L, :optsum, :θ, :theta, :β, :beta, :λ, :lambda, :stderror, :σ, :sigma, :b, :u, :lowerbd, :X, :y, :rePCA, :reterms, :feterms, :objective, :pvalues)

julia> typeof(m1.y)   # y as a vector

julia> typeof(m1.X)  # X as a two-column matrix

julia> typeof(Matrix(first(m1.reterms))) # Z as a dense matrix (it is not actually used in this form)

You can obtain the grouping factor, Subject in this case, for the random effects term by taking apart the CategoricalTerm in the first of the reterms. (This is the only reterm in this case but the design allows for multiple random effects terms.) If you just want the indices into the levels you can get them explicitly as the refs property.

julia> propertynames(first(m1.reterms))
(:trm, :refs, :cnames, :z, :wtz, :λ, :inds, :adjA, :scratch)

julia> typeof(first(m1.reterms).refs)

julia> typeof(first(m1.reterms).trm)

To get the levels you must take apart the trm itself.

julia> propertynames(first(m1.reterms).trm)
(:sym, :contrasts)

julia> propertynames(first(m1.reterms).trm.contrasts)
(:matrix, :termnames, :levels, :contrasts, :invindex)

I realize I may not be answering the question that you intended. If you want to find out how the random-effects terms are parsed then you should look at

1 Like

Thank you Prof. Bates. I tried your solution and it works great, so I think this is the way to go.