I am using MixedModels fit(@formula(y ~ a + b + c + d + e), data)
and it says “Warning: Fixed-effects matrix is rank-deficient”. I want to diagnose the problem. Can I find out which combinations of columns are not full rank? For example, is it a single categorical variable c
that has only one value, or a pair of columns a
and b
that are dependent? Can I easily extract this information from the formula and data?
The fundamental problem and why a universal solution is difficult is discussed in the docs.
You can look at the numerical rank of the model matrix to get an idea about how many “extra” columns there are, but that doesn’t tell you how many extra terms you have in your formula. Using orthogonal contrasts for categorical variables may help with numerical rank.
In practice, you can also see which predictors were dropped as part of the automatic attempts to handle rank deficiency – their estimate will be -0.0
(note the negative sign) and their standard error will be NaN
.
Alternatively, the evaluated rank, rank
, and the pivot vector, piv
, can be retrieved from the feterm
(fixed-effects term) object. In the current release (v3.8.0) this is the first element of the feterms
field in the fitted model
julia> first(m1.feterms).rank
32
julia> show(first(m1.feterms).piv)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
(When the fixed-effects model matrix is deemed full rank the pivot vector is always 1:size(X, 2)
.)
In the development version of MixedModels
, v4.0.0-DEV, replace first(m1.feterms)
by m1.feterm
.