Hi,
I used MixedModels package a couple of times in the past as a faster alternative to R’s lmer. Recently I downloaded latest official Julia version (0.6.3) and tried with some datasets. I compared the results with old MixedModels and Julia version, and I found that random effects are not matching at all (using ranef(lmeModel, named=true)). Digging little deeper i managed to match them using the following snippet:
randomEff = transpose(ranef(lmeModel, named=true)[1])
rows, columns = names(randomEff)
randomDF = DataFrame(randomEff)
names!(randomEff.colindex, map(parse, columns))
randomEff[:store] = unique(lmeModel.trms[1].f)
dfCols = randomEff.colindex.names
randomEff = stack(randomEff,dfCols[1:end-1], :store)
Now random effects appear very close to old results.
The question is: Am I doing something wrong or there is an issue with random effects in new version?
Please note that I have very little experience with Julia.
I am having difficulty reproducing the problem and would appreciate a reproducible example. Either I have somehow introduced a reordering or I misunderstand the relationship between the refs
and the levels
of a CategoricalArray
.
I may need help from @nalimilan if it is a misunderstanding of the innards of the CategoricalArrays
package. For some reason I am still confused by the documentation and the code in that package. The fact that this hinges on the order of unique(lmeModel.trms[1].f)
is very, very strange to me.
I guess that’s due to the fact that you use refs
as if its values corresponded to the order in which levels are presented, while it actually refers to the internal order of the levels (index
). You need to use CategoricalArrays.order(x.pool)[x.refs]
to get codes corresponding to user-visible ordering of levels.
Okay, thanks.
I must admit that I find that nomenclature highly unintuitive. To me it is axiomatic that for categorical data the relationship
levels(f)[f.refs] == f
should hold.
Would you be willing to redefine the result of levels(f)
so that this relationship does hold? Right now I don’t understand what the purpose of having a levels
function is if the result cannot be indexed by the refs.
Alternatively, would you be willing to add another extractor function, say reflevels
, that provides the levels that can be indexed by the refs
? I don’t object to having internal representations that allows for fast reordering, etc. In my application I want to work with the refs
vector because the, potentially very large, model matrices for the random effects are generated from indicators of the refs
. So really this problem for me is comparatively minor if I can just find out how to label the indicator columns.
1 Like
Yeah, I know it’s somewhat confusing. The point of this level of indirection is that things like CategoricalArray(["c", "b", "a"])
can be coded on the fly without resetting all the refs
at the end, once we realize that levels(x) == ["a", "b", "c"]
doesn’t match the order of appearance (which is used for refs
since we cannot know the order beforehand). It also allows changing the order of levels without modifying the underlying data.
The purpose levels
is to return the levels in the order the user wants to see them. What you seem to be interested in is rather the internal order of levels, which is CategoricalArrays.index(x.pool)
. You can always rely on the property that CategoricalArrays.index(x.pool)[x.refs] == x
.
EDIT: I’ve added a warning in the CategoricalArrays manual: https://github.com/JuliaData/CategoricalArrays.jl/pull/152