I used MixedModels package a couple of times in the past as a faster alternative to R’s lmer. Recently I downloaded latest official Julia version (0.6.3) and tried with some datasets. I compared the results with old MixedModels and Julia version, and I found that random effects are not matching at all (using ranef(lmeModel, named=true)). Digging little deeper i managed to match them using the following snippet:
randomEff = transpose(ranef(lmeModel, named=true))
rows, columns = names(randomEff)
randomDF = DataFrame(randomEff)
names!(randomEff.colindex, map(parse, columns))
randomEff[:store] = unique(lmeModel.trms.f)
dfCols = randomEff.colindex.names
randomEff = stack(randomEff,dfCols[1:end-1], :store)
Now random effects appear very close to old results.
The question is: Am I doing something wrong or there is an issue with random effects in new version?
Please note that I have very little experience with Julia.
I am having difficulty reproducing the problem and would appreciate a reproducible example. Either I have somehow introduced a reordering or I misunderstand the relationship between the
refs and the
levels of a
I may need help from @nalimilan if it is a misunderstanding of the innards of the
CategoricalArrays package. For some reason I am still confused by the documentation and the code in that package. The fact that this hinges on the order of
unique(lmeModel.trms.f) is very, very strange to me.
I guess that’s due to the fact that you use
refs as if its values corresponded to the order in which levels are presented, while it actually refers to the internal order of the levels (
index). You need to use
CategoricalArrays.order(x.pool)[x.refs] to get codes corresponding to user-visible ordering of levels.
I must admit that I find that nomenclature highly unintuitive. To me it is axiomatic that for categorical data the relationship
levels(f)[f.refs] == f
Would you be willing to redefine the result of
levels(f) so that this relationship does hold? Right now I don’t understand what the purpose of having a
levels function is if the result cannot be indexed by the refs.
Alternatively, would you be willing to add another extractor function, say
reflevels, that provides the levels that can be indexed by the
refs? I don’t object to having internal representations that allows for fast reordering, etc. In my application I want to work with the
refs vector because the, potentially very large, model matrices for the random effects are generated from indicators of the
refs. So really this problem for me is comparatively minor if I can just find out how to label the indicator columns.
Yeah, I know it’s somewhat confusing. The point of this level of indirection is that things like
CategoricalArray(["c", "b", "a"]) can be coded on the fly without resetting all the
refs at the end, once we realize that
levels(x) == ["a", "b", "c"] doesn’t match the order of appearance (which is used for
refs since we cannot know the order beforehand). It also allows changing the order of levels without modifying the underlying data.
levels is to return the levels in the order the user wants to see them. What you seem to be interested in is rather the internal order of levels, which is
CategoricalArrays.index(x.pool). You can always rely on the property that
CategoricalArrays.index(x.pool)[x.refs] == x.
EDIT: I’ve added a warning in the CategoricalArrays manual: https://github.com/JuliaData/CategoricalArrays.jl/pull/152