Hi there,
I use mixed models on a large file (500000 rows).
My model formula looks like this: Y ~ 0 + num1:factor1 + num1:factor2 + num2:factor3 + factor4 + (0 + num3|subject) + (0 + num4|subject) + (1|subject),
where num - numeric variables; factor - categorical variables/factors.
Since categorical variables have many unique levels, the fixed effects matrix is very sparse (sparsity ~0.9).
Fitting such a matrix if it is handle as dense requires a lot of time and RAM.
I had the same problem with linear regression.
My dense matrix was 20GB, but when I converted it to sparse it became only 35MB.
So, I implemented regression in R using following functions:
sparse.model.matrix (to create a sparse model/design matrix) and
MatrixModels:::lm.fit.sparse (to fit a sparse matrix and calculate coefficients).
Can I apply a similar approach to mixed models and realised it using Julia packages?
What functions / packages can I use to implement this?
That is, my main question is whether it is possible to implement mixed models with sparse matrices?
What package/functions should I use to create X and Z sparse model matrices?
Then, which function should I use for fitting the model with sparse matrices to get coefficients?
I would be very-very grateful for any help with this!
There is some provision in the MixedModels package for working with sparse model matrices for the fixed-effects parameters. I haven’t tried it out myself and am not sure how well it is integrated with the StatsModels package which does the conversion from formula/data to model matrices. Perhaps @palday or @dave.f.kleinschmidt may be able to provide more detail on how easy or difficult it would be.
@dmbates Thank you for the answer, I’m using the MixedModels package now, but it looks like it handles X matrix as dense, since model fitting consumes enormous RAM.
FixedEffectModels doesn’t – and note that “fixed effects” in econometrics has a different meaning that elsewhere (it’s “categorical fixed effect” in the terminology in MixedModels).
The Z matrices in MixedModels are already sparse (that’s about half the magic of the MixedModels approach compared – @dmbates developed a way to express fitting as a sparse penalized least squares problem, while most techniques depend on a dense generalized least squares problem). The X matrix can be sparse, but the formula interface won’t generate it. If you call the LinearMixedModel constructor directly with a sparse X that you constructed by hand, it will work though. The support for sparse FeMat was one of the changes in MixedModels 3.0
I should also emphasize @Antonina_Klyuyeva that if you run into problems but have a good minimal working example then we’re (well, I) happy to help. And a good MWE is great for improving out support and expanding our tests. Note that your MWE can also include real data, if you’re able to share that.