I am trying to fit a fairly complicated mixed model using Julia and the MixedModels package, on a dataset that has ~3.5 million rows (see screen shot for model specification and errors).
The model is nested, with the random effect grouping factors V*, S* and M* representing 3 different crossed effects, where * represents the level of the hierarchy.
I simplified the model to troubleshoot, keeping only the V1, S1, and M1 terms, but got the same error. Interestingly, the model will fit without errors if I keep V1 and S1, V1 and M1, or S1 and M1 - just not all three. The model will run with V1, S1 and M1 using lme4 in R though. Being new to Julia, I don’t know how else to go about troubleshooting this error message, so I would appreciate any help anybody can offer.
I am running Julia v0.5.0 on Windows 7, using MixedModels v0.7.5.
If @dmbates doesn’t reply here in a few days, I think you should file an issue in GitHub. It would be great if you could provide a toy data set to reproduce the problem.
I’ll reply once I get to the office.
It could have been predicted, if you leave one case uncovered someone will discover it. The only time that such a method for downdate!
will be called is when there are three or more nested grouping factors.
There is a simple but slow method of doing this, which I will add to the package today.
I am in the process of refactoring the MixedModels
package to take advantage of the LinearAlgebra
package by @andreasnoack, after which calls to downdate!
will be replaced by calls to A_mul_Bc!
. I just mention this in case someone goes looking for these methods in the future.
By the way, what do you mean by the grouping factors V*
, S*
and M*
? Do for example S1
, S2
, S3
, and S4
represent indicators of the levels of a factor S
? If so the 4 terms (1 | S1) + (1 | S2) + (1 | S3) + (1 | S4)
should be collapsed to (1 | S)
. If they are not related then I can’t make sense of what it would mean for V*
, S*
, and M*
to be nested/
P.S. Were you actually able to fit a model of this complexity in lme4?
Thank you for your reply - I will check for package updates later in the week and try to rerun the models then.
Answering your questions:
-
Sorry, my explanations for what V*, S*, and M* represent weren’t clear. Sticking with the S* terms as an example, I’m attempting to fit a nested model of the form (1 | A/B/C/D) in lme4 notation. Based on the feedback you gave when I brought this up in a GitHub issue earlier in the week, I derived the S* variables by concatenating A,B,C, and D: S1=A, S2=AB, S3=ABC, S4=ABCD
-
I regularly fit simpler versions of this model with lme4. Specifically, I don’t include the (1|M2)…(1|M6) terms, and I assume 0 correlation between all of the random effects. I also have to stratify the data by the V1 variable and fit ~50 different models without the V1 level random effects if I hope to get the model fit in any reasonable amount of time - hence my motivation to move this model over to Julia.
I pulled down the new src/cfactor.jl
from GitHub yesterday afternoon and ran the simplest version of the model that was giving the downdate!
error. The good news is that the model ran without any errors. The bad news is that it was actually slower that running the same model in lme4 (52 minutes vs 34 minutes).
Do you think the future replacement of downdate!
with A_mul_Bc!
will speed things up, or is the handling of models with 3+ nested groups an inherent shortcoming in Julia?
I’m sure the Julia version can be made faster. I haven’t started trying to make this code faster by profiling it and seeing where the bottlenecks are. Because I am refactoring the code anyway I don’t expect to do much tuning on this version,