MixedModels v2.0.0

I plan to release v"2.0.0" of the MixedModels package on Aug. 1. This version is compatible with StatsModels v"0.6.0" (a.k.a. Terms2.0), which allows for much greater flexibility in formula specification. Also, the internals have been rewritten extensively but that should not be obvious to users.

If you use the MixedModels package I would appreciate it if you could install and test the master branch to let me know of difficulties in advance of the release.

12 Likes

I encountered some version conflicts with Distributions when I tried to switch to master:

(v1.1) pkg> add MixedModels #master
  Updating git-repo `https://github.com/dmbates/MixedModels.jl.git`
 Resolving package versions...
ERROR: Unsatisfiable requirements detected for package Distributions [31c24e10]:
 Distributions [31c24e10] log:
 ├─possible versions are: [0.1.0-0.1.4, 0.2.0-0.2.13, 0.3.0, 0.6.4-0.6.7, 0.7.0-0.7.6, 0.8.0-0.8.10, 0.9.0, 0.10.0-0.10.2, 0.11.0-0.11.1, 0.12.0-0.12.5, 0.13.0, 0.14.0-0.14.2, 0.15.0, 0.16.0-0.16.4, 0.17.0, 0.18.0, 0.19.1-0.19.2, 0.20.0, 0.21.0] or uninstalled
 ├─restricted to versions [0.11.0-0.11, 0.12.0-0.12, 0.13.0-0.13, 0.14.0-0.14, 0.15.0-0.15, 0.16.0-0.16, 0.17.0-0.17, 0.18.0-0.18, 0.19.0-0.19, 0.20.0-0.20] by MixedModels [ff71e718], leaving only versions [0.11.0-0.11.1, 0.12.0-0.12.5, 0.13.0, 0.14.0-0.14.2, 0.15.0, 0.16.0-0.16.4, 0.17.0, 0.18.0, 0.19.1-0.19.2, 0.20.0]
 │ └─MixedModels [ff71e718] log:
 │   ├─possible versions are: 2.0.0 or uninstalled
 │   └─MixedModels [ff71e718] is fixed to version 2.0.0
 └─restricted to versions 0.21.0 by an explicit requirement — no versions left

(v1.1) pkg> st Distributions
    Status `~/.julia/environments/v1.1/Project.toml`
  [31c24e10] Distributions v0.21.0
  [1fd47b50] QuadGK v2.1.0
  [2913bbd2] StatsBase v0.31.0
  [4c63d2b9] StatsFuns v0.8.0

Thanks for the report. The Project.toml file for MixedModels#master allows up to v"0.20" of Distributions and I have just pushed a commit to allow v"0.21" but on my systems I still have Distributions at v"0.19.2".

It looks as if MixedModels depends on some other package that restricts the version of Distributions. Can anyone suggest how I would determine the reverse dependencies of Distributions and intersect that with the direct and implied dependencies of MixedModels.

As one might expect, I only use Distributions in one place and that usage is for a call that has been part of the package since its inception.

Will the

fit(LinearMixedModel,
    @formula(Y ~ 1 + A + B + (1|G)),
    data)

syntax be supported in MixedModels v2.0.0 or just the

fit!(LinearMixedModel(@formula(Y ~ 1 + A + B + (1|G)),
                      data))

syntax?

On that note, it seems the ability to pass contrasts has been removed on master.

The reason I prefer the fit!(LinearMixedModel(@formula(...), ...) form is because there are arguments to the fit! method that are don’t apply to the model construction. Things like verbose, maxiter, …

Also the user can modify the form of the model between the time that it is constructed and its being fit!. For a LinearMixedModel the user can switch to REML estimation. For a GeneralizedLinearMixedModel, nAGQ, the number of quadrature points in the adaptive Gauss-Hermite evaluation can be changed.

In the vanilla case, it seems to me that there is little overhead in writing fit!(LinearMixedModel(@formula(...), ...) compared to fit(LinearMixedModel, @formula(...), ...), but I will admit that I sometimes botch the first form myself by typing a comma after LinearMixedModel.

It probably wouldn’t be a big deal to allow the fit(LinearMixedModel, @formula(...), ...) form as an alternative in the vanilla case. Do you think it is worthwhile?

I do agree that it is nice to separate arguments that are part of the model being created from those used in the fitting process from a design approach. However, I think there shouldn’t be ambiguity among arguments through a fit method. Since the arguments are passed as keyword arguments the fit method should easily pass the model builder arguments and fitting process without issues. For users, it might be easier to not have to worry about determining if in one implementation an argument is considered part of the model or of the fitting process which may vary depending on the implementation and internals. I tend to favor smart/robust which might take a bit more logic from the developer side, but make it closer to fool-proof for end-users.

The other point which makes the fit method very nice is that it allows for a seamless interface with other packages. For example, a package may decide to use one model type based on features of the Formula, but if one implementation only provides fit! it requires a more elaborate handling.

Lastly, in case only fit! is provided, the fit method should throw the default no hasn’t been implemented for the type exception rather than the current error behavior.

Disclosure, losing the fit method would force me to update the code in Bioequivalence.jl so I would rather just handle it at the MixedModels level for the next release. :slight_smile:

Any changes related with the ability to manage datasets larger than the RAM memory?

Popular request at JuliaCon for the whole stats ecosystem… I don’t think anything has happened just yet, but we are well aware of the need…