Any Julia's equivalent to R's packages mcgv or mixed-effects models larger than memory?


I’m interested in fitting mixed effects models with large dataset that don’t fit on memory.
R’s lme4 it’s too slow and doesn’t work if the dataset is large (a fraction of your RAM).
speedglm and mgcv are a little bit faster but still have problems.

I’ve decided to move to Julia to try to find a better option.
Mixedmodels.jl is like lme4.

Does Julia have something more like mgcv, faster and able to run Generalized additive mixed models?
Or something able to fit mixed-effects models with datasets of around 50GB (on a computer with 12GB)?
I mean not loading everything on memory, automatically streaming to disk as necessary.

Another option would be Spark or Flink, they work with very large datasets but I think they don’t have any implementation of mixed-effects models.

You may want to check

I haven’t used it myself but they claim to be able to work with very large data sets.

By the way, although MixedModels.jl is similar in design to lme4 (not surprising given my involvement in both projects) it is much more careful of the storage usage and generally much faster.


Unfortunatelly it doesn’t support random effect models, just fixed effects.

What about OnlineStats.jl or JuliaDB, is it possible to use them to fit random effect models?

In sas, this can be done but pretty slowly. In r, sparklyr can do regressions. For fixed effects, you can add dummy group variables by yourself. Not sure about out of memory random effect model fitting.

Try looking into OnlineStats.jl. It can fit GLMs

Any thoughts on making some of the absorb features from reghdfe available in MixedModels.jl?

I couldn’t find any example on how to run regressions with random effects (repeated measures) with OnlineStats.

I have opened a thread with an example and some benchmarks

I don’t know what that is. I’ve never used Stata

As far as I can tell its an algorithm described in Correia (2017): A Feasible Estimator for Linear Models with Multi-Way Fixed Effects and outlined here.

I recently had to implement a daily time series across 4 years of daily data and 110 regional units. Memory was a constraint even with lme4, though I didn’t get a chance to try MixedModels. The regression was instant with reghdfe, so I’m curious what the prospects are for incorporation into MixedModels, or even better, GLM.

1 Like