[ANN] Metida.jl: mixed-effects models fitting package

Metida.jl is a Julia package for fitting mixed-effects models with flexible covariance structure.

Implemented covariance structures:

  • Scaled Identity (SI)
  • Diagonal (DIAG)
  • Autoregressive (AR)
  • Heterogeneous Autoregressive (ARH)
  • Compound Symmetry (CS)
  • Heterogeneous Compound Symmetry (CSH)
  • Autoregressive Moving Average (ARMA)

All structures can be applied to the random (G) or repeated ® part of the variance-covariance matrix (V). Where:

V = ZGZ' + R

Documentation available here.

v0.2.3

  • Documentation
  • FullDummyCoding fix
  • FunctionTerm fix
23 Likes

Version 0.3.0 released.

Now random/repeated model syntax is near classic R/MixedModel style:

    lmm = LMM(@formula(var~sequence+period+formulation), df0;
    random = VarEffect(@covstr(formulation|subject), CSH),
    repeated = VarEffect(@covstr(formulation|subject), DIAG),
    )
    fit!(lmm)

No more subject keyword: now blocking factor construct automatically. Other minor things and documentation…

4 Likes

Version 0.4.0 released.

New covariance types:

  • Toeplitz
  • ToeplitzParameterized
  • CustomCovarianceType

I think CustomCovarianceType is a really interesting thing. Users can specify methods to construct covariance matrix, G and R parts separately. Read more in docs

4 Likes

Version 0.6.0 released.

New covariance types:

  • HeterogeneousToeplitz
  • HeterogeneousToeplitzParameterized

Some changes in API, rho-link function, validation & documentation.

2 Likes

Hello.

How does it compare to the MixedModels.jl package in terms of memory usage, speed and capabilities?

1 Like

Hello! MixedModels.jl much faster, and consume less memory. If Mixed Models.jl applicable for your task it will be better to use MixedModels.jl If you need Satterthwaite ddf approximation, analysis with repeated measurements, or application of some nontrivial covariance structures - Metida.jl solve these problems. I didn’t found MixedEffects.jl maybe you mean MixedModels.jl, anyway if I missed this package, please guide me to it.

5 Likes

Version 0.7.1 released.

Better performance (twice faster in some tasks), dof_satter method for a multidimensional case, minor changes, a bugfix for LMM show, documentation, more stable tests.

3 Likes

Version 0.9.1 released.

  • documentation fix
  • add test
  • add experimental Type III Tests of Fixed Effects

MetidaNLopt, MetidaCu updated for v0.9.0.

2 Likes

Version 0.12.0 released.

The custom covariance structure was redesigned. Now it can be easily implemented, just add one struct and 2-3 methods. Doc here. Better performance with multithreading. Minor changes in output.

3 Likes

Metida Version 0.12.2 will be released soon.

List of implemented covariance structures:

  • Scaled Identity (SI)
  • Diagonal (DIAG)
  • Autoregressive (AR)
  • Heterogeneous Autoregressive (ARH)
  • Compound Symmetry (CS)
  • Heterogeneous Compound Symmetry (CSH)
  • Autoregressive Moving Average (ARMA)
  • Toeplitz (TOEP)
  • Toeplitz Parameterized (TOEPP)
  • Heterogeneous Toeplitz (TOEPH)
  • Heterogeneous Toeplitz Parameterized (TOEPHP)
  • Spatial Exponential (SPEXP)
  • Spatial Power (SPPOW)
  • Spatial Gaussian (SPGAU)
  • Custom Covariance Type

Now, ‘rand’ is available for generating a random response vector from the fitted model (docs).

5 Likes

Metida Version 0.12.4 released.

Better performance;
Better ‘rand’ and ‘rand!’ implementation;
Estimates table for all coefficients.

Made some preparations for parametric bootstrap and multiple imputations.

Happy New Year! :smile_cat: :snowman_with_snow: :tada: :fireworks:

4 Likes

Out of curiosity, why there are hard-coded limitations, e.g. 160k observations? Doesn’t depend from the hardware?

Hi! I think this is not ‘hard-coded limitations’. :smile_cat: When the model is constructed full Z matrix is used for this. This means that if you have 200k observations and 50k subjects - Z matrix will be 200x50k (10 000 mln numbers) and this matrix will be out of memory. I think this problem can be solved in some cases. So in StatsModels there is no easy way to get a sparse model matrix for categorical terms with many levels.

Ok, so there isn’t any hard coded limits… I think how it is written now isn’t too clear… ir really seems like the one stated are fixed problem limite, like in some commercial software… .
You could write instead something like “a standard pc with xxx GB of RAM can efficiently process problems up to …”

I think I should add some clarification in docs. Also in some cases, it is very difficult to predict how many observations can be handled, because, for example, if the model includes two random factors by 100k levels it will cost twice memory as one. So it is not only a data-dependent problem. The solution is to make a sparse Z matrix, for this own implementation of StatsModels methods should be done - and this is part of work on future.

@sylvaticus If you don’t need structured R-side covariance, then MixedModels.jl already uses optimized sparse methods for Z matrices. We don’t have support for many restricted G-side (random effects) covariance structures out of the box but they’re not too hard to implement. The “DIAG” case here corresponds to zerocorr in MixedModel; the “SI” case here could be implemented without too much difficulty. One of my TODOs is to document how to create new types for restricted G-side covariance.

1 Like