I’m interested in fitting mixed effects models with large dataset that don’t fit on memory.
R’s lme4 it’s too slow and doesn’t work if the dataset is large (a fraction of your RAM).
speedglm and mgcv are a little bit faster but still have problems.
I’ve decided to move to Julia to try to find a better option.
Mixedmodels.jl is like lme4.
Does Julia have something more like mgcv, faster and able to run Generalized additive mixed models?
Or something able to fit mixed-effects models with datasets of around 50GB (on a computer with 12GB)?
I mean not loading everything on memory, automatically streaming to disk as necessary.
Another option would be Spark or Flink, they work with very large datasets but I think they don’t have any implementation of mixed-effects models.
By the way, although MixedModels.jl is similar in design to lme4 (not surprising given my involvement in both projects) it is much more careful of the storage usage and generally much faster.
In sas, this can be done but pretty slowly. In r, sparklyr can do regressions. For fixed effects, you can add dummy group variables by yourself. Not sure about out of memory random effect model fitting.
I recently had to implement a daily time series across 4 years of daily data and 110 regional units. Memory was a constraint even with lme4, though I didn’t get a chance to try MixedModels. The regression was instant with reghdfe, so I’m curious what the prospects are for incorporation into MixedModels, or even better, GLM.