I would like to build an identity link negative binomial 1 glm model for some data (where negative binomial 1 means that the dispersion parameter is of the form (1 + theta)*mean). This means I need to simultaneously control both the link function form and the dispersion parameter form.

Is there a library/function I can use to control both?

More generally, I’d be happy if I could specify both a link function and an arbitrary form of the dispersion, only needing to supply the dispersion parameter theta.

Although I am not sure they allow for over dispersion. You might have to use the joint modeling technique : fit another linear model for the overdispersion parameter and then re-run the original one taking it into account, as described in McCullagh & Nelder 1989 “Generalized liner model”. I think the joint dispersion is chapter 8 but I am not sure.

So far as I can tell, this does not let me choose between an NB1 and NB2 (or more generally NB-P) model. These are described in Hilbe’s Negatvie Binomial Regression book.

It is possible to specify such a model in Turing, and I ended up doing so. The unfortunate part is that the time to sample the Turing model accurately on my data is much greater than with, say negbinfit in the GLM library. There are more general advantages to Turing’s Bayesian philosophy of course, but the speed difference can be orders of magnitude, such that it is impractical. The best work-around that I found was to just get the MAP estimate using optim.jl as a substitute for the MLE, but this shortcut is necessarily less accurate and without the benefit of a full model (error bars, etc).

negbifit.jl does estimate the dispersion. Instead of conducting auxiliary regression for theta, as you describe and which I have seen in some other implementations, it uses an MLE loop method to refine theta, constantly fitting new NB glms with fixed thetas until they reach some optimal loglikelihood.

This implementation is just NB2, and having spent time with the source code it would be a bit painful to modify to work with NB-1/NB-P, since the derivatives for the MLE and the loglikelihood for the stopping criteria are hard-coded for NB2 regression.

There are different types of NB likelihoods in GPLikelihoods.jl. They are just mappings from reals to a distribution, so should be able to be used for non-GP models.

Of course I don’t know the scale of your problem.
Just note that you can get the parameter variance-covariance matrix via vcov from MLE/MAP estimates in Turing.