Any Julia's equivalent to R's packages mcgv or mixed-effects models larger than memory?

Juan · June 21, 2018, 3:52pm

Hello.

I’m interested in fitting mixed effects models with large dataset that don’t fit on memory.
R’s lme4 it’s too slow and doesn’t work if the dataset is large (a fraction of your RAM).
speedglm and mgcv are a little bit faster but still have problems.

I’ve decided to move to Julia to try to find a better option.
Mixedmodels.jl is like lme4.

Does Julia have something more like mgcv, faster and able to run Generalized additive mixed models?
Or something able to fit mixed-effects models with datasets of around 50GB (on a computer with 12GB)?
I mean not loading everything on memory, automatically streaming to disk as necessary.

Another option would be Spark or Flink, they work with very large datasets but I think they don’t have any implementation of mixed-effects models.

dmbates · June 22, 2018, 5:50pm

You may want to check https://github.com/linkedin/photon-ml

I haven’t used it myself but they claim to be able to work with very large data sets.

dmbates · June 22, 2018, 5:54pm

By the way, although MixedModels.jl is similar in design to lme4 (not surprising given my involvement in both projects) it is much more careful of the storage usage and generally much faster.

Juan · November 18, 2018, 6:14pm

Unfortunatelly it doesn’t support random effect models, just fixed effects.

What about OnlineStats.jl or JuliaDB, is it possible to use them to fit random effect models?

Yifan_Liu · November 18, 2018, 9:50pm

In sas, this can be done but pretty slowly. In r, sparklyr can do regressions. For fixed effects, you can add dummy group variables by yourself. Not sure about out of memory random effect model fitting.

xiaodai · November 18, 2018, 10:40pm

Try looking into OnlineStats.jl. It can fit GLMs

pdeffebach · November 19, 2018, 12:25am

Any thoughts on making some of the absorb features from reghdfe available in MixedModels.jl?

Juan · November 19, 2018, 12:55am

I couldn’t find any example on how to run regressions with random effects (repeated measures) with OnlineStats.

I have opened a thread with an example and some benchmarks

dmbates · November 19, 2018, 4:16pm

I don’t know what that is. I’ve never used Stata

pdeffebach · November 19, 2018, 4:41pm

As far as I can tell its an algorithm described in Correia (2017): A Feasible Estimator for Linear Models with Multi-Way Fixed Effects and outlined here.

I recently had to implement a daily time series across 4 years of daily data and 110 regional units. Memory was a constraint even with lme4, though I didn’t get a chance to try MixedModels. The regression was instant with reghdfe, so I’m curious what the prospects are for incorporation into MixedModels, or even better, GLM.

Topic		Replies	Views
Is there any glmmTMB package for Julia? Statistics	15	2517	April 28, 2022
Question: JuliaDB and regression models General Usage regression , juliadb	16	1488	July 16, 2019
GLM is slow on large datasets. Using OnlineStats for regressions? MixedModels? Performance glm	25	5092	November 26, 2018
Fitting Mixed Effects Models - Python, Julia or R? Statistics blog-post	0	530	January 20, 2022
Julia run using terminal for 1GB dataset showing out of memory error General Usage question	18	5005	August 31, 2017

Any Julia's equivalent to R's packages mcgv or mixed-effects models larger than memory?

Related topics