Implement Mixed Models with sparse X and Z matrices

Hi there,
I use mixed models on a large file (500000 rows).
My model formula looks like this:
Y ~ 0 + num1:factor1 + num1:factor2 + num2:factor3 + factor4 + (0 + num3|subject) + (0 + num4|subject) + (1|subject),
where num - numeric variables; factor - categorical variables/factors.

Since categorical variables have many unique levels, the fixed effects matrix is ​​very sparse (sparsity ~0.9).
Fitting such a matrix if it is handle as dense requires a lot of time and RAM.

I had the same problem with linear regression.
My dense matrix was 20GB, but when I converted it to sparse it became only 35MB.
So, I implemented regression in R using following functions:

  1. sparse.model.matrix (to create a sparse model/design matrix) and
  2. MatrixModels:::lm.fit.sparse (to fit a sparse matrix and calculate coefficients).

Can I apply a similar approach to mixed models and realised it using Julia packages?
What functions / packages can I use to implement this?

That is, my main question is whether it is possible to implement mixed models with sparse matrices?
What package/functions should I use to create X and Z sparse model matrices?
Then, which function should I use for fitting the model with sparse matrices to get coefficients?

I would be very-very grateful for any help with this!

There is some provision in the MixedModels package for working with sparse model matrices for the fixed-effects parameters. I haven’t tried it out myself and am not sure how well it is integrated with the StatsModels package which does the conversion from formula/data to model matrices. Perhaps @palday or @dave.f.kleinschmidt may be able to provide more detail on how easy or difficult it would be.

You should try with

@Juan Does this package can handle mixed model formulas (I mean, random effects part)?

@dmbates Thank you for the answer, I’m using the MixedModels package now, but it looks like it handles X matrix as dense, since model fitting consumes enormous RAM.

FixedEffectModels doesn’t – and note that “fixed effects” in econometrics has a different meaning that elsewhere (it’s “categorical fixed effect” in the terminology in MixedModels).

The Z matrices in MixedModels are already sparse (that’s about half the magic of the MixedModels approach compared – @dmbates developed a way to express fitting as a sparse penalized least squares problem, while most techniques depend on a dense generalized least squares problem). The X matrix can be sparse, but the formula interface won’t generate it. If you call the LinearMixedModel constructor directly with a sparse X that you constructed by hand, it will work though. The support for sparse FeMat was one of the changes in MixedModels 3.0

1 Like

Relevant (merged) pull request: https://github.com/JuliaStats/MixedModels.jl/pull/309

This functionality is very new and the people using it so far are close collaborators, so we haven’t yet written good documentation on it.

EDIT:

One more tip: use Grouping() pseudo-contrasts when you have many levels of the grouping variable: contrasts=Dict(:subject => Grouping())

I should also emphasize @Antonina_Klyuyeva that if you run into problems but have a good minimal working example then we’re (well, I) happy to help. And a good MWE is great for improving out support and expanding our tests. Note that your MWE can also include real data, if you’re able to share that.