How to obtain numeric arrays from a linear mixed model formula?

kai · November 28, 2019, 6:46pm

Hello. I have a data frame (“dat”, with variables x1, y, and id) and a formula (@formula(y ~ 1 + x1 + (1 | id))). I want to obtain the following numeric arrays:

y: response vector
X: design matrix for fixed effect, with two columns: intercept and x1
Z: design matrix for random effects: here just intercept
id: cluster identifier

I’m aware that the StatsModels.jl package allows me to parse a fixed effect model with functions such as “schema, apply_schema, modelcols”.

My question is how do I do it with mixed effect models? Thank you.

dmbates · November 30, 2019, 4:25pm

I think the easiest way to obtain these arrays is to create the LinearMixedModel object and extract the properties

julia> using MixedModels, RCall

julia> m1 = LinearMixedModel(@formula(height ~ 1 + age + (1|Subject)), rcopy(R"nlme::Oxboys"));

julia> propertynames(m1)
(:formula, :sqrtwts, :A, :L, :optsum, :θ, :theta, :β, :beta, :λ, :lambda, :stderror, :σ, :sigma, :b, :u, :lowerbd, :X, :y, :rePCA, :reterms, :feterms, :objective, :pvalues)

julia> typeof(m1.y)   # y as a vector
Array{Float64,1}

julia> typeof(m1.X)  # X as a two-column matrix
Array{Float64,2}

julia> typeof(Matrix(first(m1.reterms))) # Z as a dense matrix (it is not actually used in this form)
Array{Float64,2}

You can obtain the grouping factor, Subject in this case, for the random effects term by taking apart the CategoricalTerm in the first of the reterms. (This is the only reterm in this case but the design allows for multiple random effects terms.) If you just want the indices into the levels you can get them explicitly as the refs property.

julia> propertynames(first(m1.reterms))
(:trm, :refs, :cnames, :z, :wtz, :λ, :inds, :adjA, :scratch)

julia> typeof(first(m1.reterms).refs)
Array{Int32,1}

julia> typeof(first(m1.reterms).trm)
StatsModels.CategoricalTerm{DummyCoding,String,25}

To get the levels you must take apart the trm itself.

julia> propertynames(first(m1.reterms).trm)
(:sym, :contrasts)

julia> propertynames(first(m1.reterms).trm.contrasts)
(:matrix, :termnames, :levels, :contrasts, :invindex)

I realize I may not be answering the question that you intended. If you want to find out how the random-effects terms are parsed then you should look at https://github.com/JuliaStats/MixedModels.jl/blob/master/src/randomeffectsterm.jl

kai · December 4, 2019, 12:52am

Thank you Prof. Bates. I tried your solution and it works great, so I think this is the way to go.

dave.f.kleinschmidt · February 26, 2020, 8:54pm

@dmbates solution will work, but since you asked how to do it using “standard” statsmodels functions, the trick is to specify that you want to use a MixedModel (or LinearMixedModel or GeneralizedMixedModel) context (the third argument) when you apply_schema:

julia> using MixedModels, StatsModels
[ Info: Precompiling MixedModels [ff71e718-51f3-5ec2-a782-8ffcbfa3c316]

julia> dat = (y = rand(100), x1 = rand(100), id = repeat([:a, :b, :c, :d], 25));

julia> f = @formula(y ~ 1 + x1 + (1|id))
FormulaTerm
Response:
  y(unknown)
Predictors:
  1
  x1(unknown)
  (id)->1 | id

julia> f_sch = apply_schema(f, schema(dat), MixedModel)
FormulaTerm
Response:
  y(continuous)
Predictors:
  1
  x1(continuous)
  (1 | id)

julia> y, (X, Z) = modelcols(f_sch, dat)

(The last step uses tuple destructuring, sicne modelcols of a formula term returns a tuple of the left-hand and right-hand sides, and the right-hand size is itself a tuple the fixed effects matrix and one or more random effects matrices…)

Just to prove that this gives you what you were looking for:

julia> y == dat.y
true

julia> X == [ones(100) dat.x1]
true

julia> size(Z)
(100, 4)

julia> Z[:,1] == (dat.id .== :a)
true

julia> Z[:,2] == (dat.id .== :b)
true

julia> Z.refs[1:6]
6-element Array{Int32,1}:
 1
 2
 3
 4
 1
 2

julia> Z.levels
4-element Array{Symbol,1}:
 :a
 :b
 :c
 :d

julia> dat.id == Z.levels[Z.refs]
true

Note that Z is a special ReMat type that’s internal to MixedModels. It’s an AbstractArray but uses a special, compact representation of the Z matrix. If you want to get the full matrix you can do something like

julia> Matrix(Z)
100×4 Array{Float64,2}:
 1.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0
 0.0  0.0  1.0  0.0
 0.0  0.0  0.0  1.0
 1.0  0.0  0.0  0.0
 ⋮
 1.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0
 0.0  0.0  1.0  0.0
 0.0  0.0  0.0  1.0

which creates a dense matrix. If you want a sparse matrix, you can do:

julia> sparse(Z)
100×4 LinearAlgebra.Adjoint{Float64,SparseArrays.SparseMatrixCSC{Float64,Int32}}:
 1.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0
 0.0  0.0  1.0  0.0
 0.0  0.0  0.0  1.0
 1.0  0.0  0.0  0.0
 ⋮
 1.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0
 0.0  0.0  1.0  0.0
 0.0  0.0  0.0  1.0

(It’s an Adjoint because ReMat also stores the adjoint of the sparse Z internally…)

dave.f.kleinschmidt · February 26, 2020, 8:57pm

Also as a follow up, this is the first thing that MixedModels does internally when it’s creating a LinearMixedModel. See these lines in the source code.

kai · March 2, 2020, 6:56pm

Thank you Dave. This is great to know!

emreK · June 10, 2020, 2:34pm

Hi Dave,

The same code you used above returns an error here, for f_sch = apply_schema(f, schema(dat), MixedModel).
It reads “UndefVarError: schema not defined”. Using Julia 1.4.1, and MixedModels 1.1.6 and StatsModels 0.5.0.

Do you have any clue?

dmbates · June 10, 2020, 3:05pm

Do you have the line with

using MixedModels, StatsModels

in your example? The schema function is in the StatsModels namespace and is not re-exported by MixedModels. I suppose we could re-export it from MixedModels if this was important.

dave.f.kleinschmidt · June 10, 2020, 3:13pm

Doug’s right, you’ll have to have using StatsModels somewhere. As for whether to re-export schema, I suppose it might be useful but at that point might as well just re-export everything. That seems like overkill though…

emreK · June 10, 2020, 6:04pm

Yes I have the line “using MixedModels, StatsModels”

dmbates · June 11, 2020, 1:18pm

You say you are using MixedModels v1.1.6 which could be the problem. The current release version is 2.3.0

emreK · June 11, 2020, 1:54pm

Ahh… I did not think of it coz I installed it just yesterday. Now I have it updated to the current release, and it works. Thanks for the help!

Topic		Replies	Views
Implement Mixed Models with sparse X and Z matrices Machine Learning question , package , sparse	7	869	November 3, 2020
Getting started with mixed models Data	4	689	June 24, 2020
Method Error when using LinearMixedModel function Specific Domains statistics	3	234	October 6, 2022
MixedModels and Random Interactions Statistics question , package , statistics , error	6	1553	August 17, 2017
Dictionary for random effects in MultiLevel.jl model General Usage	1	413	June 13, 2023

How to obtain numeric arrays from a linear mixed model formula?

Related topics