Different parametrizations of probability distributions in Distributions.jl


#1

In R there are several probability distributions that allow keywords for different parametrizations of the same distribution. For example, I use the Negative Binomial distribution a lot in my simulations of some biological processes and what particular case has me a little stumped.

I have some process that produces a vector with 950 Ints (no zeros) and I want to compare the actual distribution of the data with a theoretical NegBinom distribution. In R there is a parametrization that uses the mean, which I can calculate from my data, and the probability, which I can change to see how that affects the histogram. We don’t have access to that in Julia and while I’m sure there’s a way to get the parameter r from the mean, I haven’t been able to figure it out.

What would be the best strategy to allow different parametrizations in different distributions? What would it take to change Negative Binomial, as an example, and I can probably try and do it as an exercise? I don’t know if this has been discussed before or if there are already plans in this direction, if so, I’ll appreciate a link to the relevant discussion.


#2

For mean failure number m and success prob. p, it is NegativeBinomial(m*(p/(1 - p)), p). Actually you would like to fit p and r directly without trying to guess p from the histogram, I do not know if that is implemented somewhere.


#3

Thanks, that works! I think implementing different parametrizations would be a nice feature, but I really don’t know what would be the best way to do it.

And I’m not aware of any fitting function for Negative Binomial because there is no implementation of the sufficient statistics.


#4

I understand that different parametrizations are used in different communities/textbooks for some distributions, however I think that the best approach is to pick one, document it, and use it consistently in code.

Anything else will just lead to confusion and subtle bugs. IMO Distributions already has too much special-cased constructors, eg for MvNormal. Also cf Distributions#584 and issues referenced there.

Usually the conversion to/from other parametrizations is trivial, so the documentation could mention that. So submitting PRs that extend docstrings would be my suggestion.


#5

I think that, considering that mean(::NegativeBinomial) is defined even if it is a trivial operation, the moral inverse
NegativeBinomial((mean=x, p=2)) (or NegativeBinomial((mean=x, p=2)) to avoid conflict) could also be provided. Especially for the many distributions with rate= vs scale= parameters.


#6

Sure, documentation and examples are always good, however in the discussion you posted @simonbyrne talks about the possibility of using keyword arguments, with one canonical implementation, and I really believe that’s the way to go.