There seems to be a lot going on here.
- MixedModels.jl does support GLMMs with offsets (added in March 2021)
- MixedModels.jl does not support negative binomial models and, as of right now, we don’t have any plans to add in that support. Negative binomial is only in the exponential family under certain restrictions and it’s challenging to get that to work well with mixed models. (See also the rather brittle support for negative binomial in R’s
lme4, which is the spiritual predecessor to MixedModels.jl. R’s glmmTMB is a little bit better on this front because it uses a more general but less optimized approach; I’ve thought about implementing something similar in Julia as an additional package, but I have a lot of other stuff to do and only so many hours to work on FOSS.) More specifically, the negative binomial distribution is only in the exponential family if you fix the first parameter (the number of failures), which is why GLM.jl requires you to estimate the dispersion separately – you can determine the number of failures from the the location parameter (which is what the GLM framework – not just in GLM.jl but the the general sense used in McCullagh and Nelder’s Generalized Linear Models book – estimates) and dispersion. - MixedModels.jl will return a result for models with a dispersion parameter, but it’s likely wrong. There are a couple of thorny issues here that I haven’t had the time to address because it’s a mixture of numerical issues (it seems that we’re failing to find the optimum) and theoretical issues (the approximation we’re using for the log-likelihood doesn’t obviously take the dispersion parameter into account, although several of the quantities evaluated covary with the value of the dispersion parameter), so for a given parameter value, we may be not returning the correct log-likelihood. I’m pretty sure it’s a mixture of these issues and not just one or the other because I’ve also used a very slow but I’m-pretty-sure-correct procedure for evaluating the log likelihood and optimization still failed and this value differed from the optimized procedure. Realistically, I need to spend a few dedicated days working through how the profiled log-likelihood we use for linear mixed model combines with the optimized Laplace approximation we use for GLMM via the local weighted linear approximation to see where the missing term is. Then I need to take the work I’ve been doing to get GLMM opimization to have better starting values (https://github.com/JuliaStats/MixedModels.jl/pull/588) and combine all these things together. Both the theory and practice sections here are things that I really need days blocked off to focus on, but MixedModels.jl is something I do after I get off of work. All that said, for a given set of parameter values, MixedModels.jl will compute the correct predictions, residuals, etc.; but the log-likelihood is probably wrong (and so the optimizer doesn’t find the correct set of parameter values).
- I imagine you can use Turing.jl and Soss.jl to fit a frequentist model, although neither seem to make it easy out of the box. The trick is that sampling methods like HMC require evaluating the gradient, so if you use the gradient to find the optimum with flat priors, then you have found the maximum likelihood. In other words, you can use a Bayesian PPL and take advantage of the fact that the maximum a posteriori (MAP) estimates under flat priors are equivalent to the maximum likelihood estimates (MLE).
- Alternatively, you can often reformulate (the inferential question associated with) a negative binomial model as (an inferential question associated with) a binomial or Poisson model, both of which are supported by MixedModels.jl.
- Splines2.jl and BSplines.jl will help you splines. Creating a package that integrates one of these and StatsModels.jl (which provides the formula interface in Julia) is on my long todo list, because that would help me implement GAMMs, but that’s not happening in the foreseeable future.
And going back to other comments about the OP’s post – the concerns about Julia turning into Python and other general complaints about transitioning to Julia are certainly things that are interesting, but it helps to have a very focused question, especially ones with sample data and minimum working examples (“here’s where I got stuck”). There seem to be two distinct questions in this post:
- How do I fit negative binomial models with an offset using GLM.jl?
- Why doesn’t MixedModels.jl support models with a dispersion parameter and is this support planned for the future?
Both of these are great questions, but they weren’t immediately obvious in the OP text.