# How to obtain numeric arrays from a linear mixed model formula?

Hello. I have a data frame (“dat”, with variables x1, y, and id) and a formula (@formula(y ~ 1 + x1 + (1 | id))). I want to obtain the following numeric arrays:

• y: response vector
• X: design matrix for fixed effect, with two columns: intercept and x1
• Z: design matrix for random effects: here just intercept
• id: cluster identifier

I’m aware that the StatsModels.jl package allows me to parse a fixed effect model with functions such as “schema, apply_schema, modelcols”.

My question is how do I do it with mixed effect models? Thank you.

I think the easiest way to obtain these arrays is to create the `LinearMixedModel` object and extract the properties

``````julia> using MixedModels, RCall

julia> m1 = LinearMixedModel(@formula(height ~ 1 + age + (1|Subject)), rcopy(R"nlme::Oxboys"));

julia> propertynames(m1)
(:formula, :sqrtwts, :A, :L, :optsum, :θ, :theta, :β, :beta, :λ, :lambda, :stderror, :σ, :sigma, :b, :u, :lowerbd, :X, :y, :rePCA, :reterms, :feterms, :objective, :pvalues)

julia> typeof(m1.y)   # y as a vector
Array{Float64,1}

julia> typeof(m1.X)  # X as a two-column matrix
Array{Float64,2}

julia> typeof(Matrix(first(m1.reterms))) # Z as a dense matrix (it is not actually used in this form)
Array{Float64,2}
``````

You can obtain the grouping factor, `Subject` in this case, for the random effects term by taking apart the `CategoricalTerm` in the first of the `reterms`. (This is the only `reterm` in this case but the design allows for multiple random effects terms.) If you just want the indices into the levels you can get them explicitly as the `refs` property.

``````julia> propertynames(first(m1.reterms))
(:trm, :refs, :cnames, :z, :wtz, :λ, :inds, :adjA, :scratch)

julia> typeof(first(m1.reterms).refs)
Array{Int32,1}

julia> typeof(first(m1.reterms).trm)
StatsModels.CategoricalTerm{DummyCoding,String,25}
``````

To get the levels you must take apart the `trm` itself.

``````julia> propertynames(first(m1.reterms).trm)
(:sym, :contrasts)

julia> propertynames(first(m1.reterms).trm.contrasts)
(:matrix, :termnames, :levels, :contrasts, :invindex)
``````

I realize I may not be answering the question that you intended. If you want to find out how the random-effects terms are parsed then you should look at https://github.com/JuliaStats/MixedModels.jl/blob/master/src/randomeffectsterm.jl

1 Like

Thank you Prof. Bates. I tried your solution and it works great, so I think this is the way to go.

@dmbates solution will work, but since you asked how to do it using “standard” statsmodels functions, the trick is to specify that you want to use a `MixedModel` (or `LinearMixedModel` or `GeneralizedMixedModel`) context (the third argument) when you `apply_schema`:

``````julia> using MixedModels, StatsModels
[ Info: Precompiling MixedModels [ff71e718-51f3-5ec2-a782-8ffcbfa3c316]

julia> dat = (y = rand(100), x1 = rand(100), id = repeat([:a, :b, :c, :d], 25));

julia> f = @formula(y ~ 1 + x1 + (1|id))
FormulaTerm
Response:
y(unknown)
Predictors:
1
x1(unknown)
(id)->1 | id

julia> f_sch = apply_schema(f, schema(dat), MixedModel)
FormulaTerm
Response:
y(continuous)
Predictors:
1
x1(continuous)
(1 | id)

julia> y, (X, Z) = modelcols(f_sch, dat)
``````

(The last step uses tuple destructuring, sicne `modelcols` of a formula term returns a tuple of the left-hand and right-hand sides, and the right-hand size is itself a tuple the fixed effects matrix and one or more random effects matrices…)

Just to prove that this gives you what you were looking for:

``````julia> y == dat.y
true

julia> X == [ones(100) dat.x1]
true

julia> size(Z)
(100, 4)

julia> Z[:,1] == (dat.id .== :a)
true

julia> Z[:,2] == (dat.id .== :b)
true

julia> Z.refs[1:6]
6-element Array{Int32,1}:
1
2
3
4
1
2

julia> Z.levels
4-element Array{Symbol,1}:
:a
:b
:c
:d

julia> dat.id == Z.levels[Z.refs]
true
``````

Note that `Z` is a special `ReMat` type that’s internal to MixedModels. It’s an `AbstractArray` but uses a special, compact representation of the `Z` matrix. If you want to get the full matrix you can do something like

``````julia> Matrix(Z)
100×4 Array{Float64,2}:
1.0  0.0  0.0  0.0
0.0  1.0  0.0  0.0
0.0  0.0  1.0  0.0
0.0  0.0  0.0  1.0
1.0  0.0  0.0  0.0
⋮
1.0  0.0  0.0  0.0
0.0  1.0  0.0  0.0
0.0  0.0  1.0  0.0
0.0  0.0  0.0  1.0
``````

which creates a dense matrix. If you want a sparse matrix, you can do:

``````julia> sparse(Z)
1.0  0.0  0.0  0.0
0.0  1.0  0.0  0.0
0.0  0.0  1.0  0.0
0.0  0.0  0.0  1.0
1.0  0.0  0.0  0.0
⋮
1.0  0.0  0.0  0.0
0.0  1.0  0.0  0.0
0.0  0.0  1.0  0.0
0.0  0.0  0.0  1.0
``````

(It’s an `Adjoint` because `ReMat` also stores the adjoint of the sparse Z internally…)

1 Like

Also as a follow up, this is the first thing that MixedModels does internally when it’s creating a `LinearMixedModel`. See these lines in the source code.

2 Likes

Thank you Dave. This is great to know!

1 Like

Hi Dave,

The same code you used above returns an error here, for f_sch = apply_schema(f, schema(dat), MixedModel).
It reads “UndefVarError: schema not defined”. Using Julia 1.4.1, and MixedModels 1.1.6 and StatsModels 0.5.0.

Do you have any clue?

Do you have the line with

using MixedModels, StatsModels

in your example? The schema function is in the StatsModels namespace and is not re-exported by MixedModels. I suppose we could re-export it from MixedModels if this was important.

1 Like

Doug’s right, you’ll have to have `using StatsModels` somewhere. As for whether to re-export schema, I suppose it might be useful but at that point might as well just re-export everything. That seems like overkill though…

Yes I have the line “using MixedModels, StatsModels”

You say you are using MixedModels v1.1.6 which could be the problem. The current release version is 2.3.0

Ahh… I did not think of it coz I installed it just yesterday. Now I have it updated to the current release, and it works. Thanks for the help!

1 Like