A PR to port SkewNormal to the maintained Distributions.jl from the unmaintained SkewDist.jl has been merged.
When functionality from unmaintained packages are fully subsumed into well maintained packages, unmaintained packages will be removed from this working list.
Thanks Chad.
Originally my Task View list was only going to include packages w/ probability distributions.
->My intention was to ease discovery & reduce fragmentation.
Then I reluctantly made a second list w/ packages for “working” w/ random variables.
Then @juliohm & you gave some additional links. I’m not sure if there should be a separate Task View on PPL in Julia. I’ll include them in the mean time (but I don’t really use PPLs).
Thanks @Albert_Zevelev, I think having a list like this is a really great idea.
In terms of a taxonomy, things are already pretty fuzzy, and will only get fuzzier from here.
The Distributions.jl approach to this is more along the lines of classical statistics. There’s a big collection of “distributions people might want to use”, and the idea is mostly to just grab one and use it. There are just a few combinators like Truncated, but they’re mostly just kind of a bonus feature.
There are other PPLs of course; I mentioned Soss in particular because a Soss model is a distribution. So even if you never do Bayesian inference, you could use it as a handy way to define a distribution over named tuples.
My point is, it’s nice to have a list of commonly-used distributions, but this view is very limited. The real potential is in having flexible ways to create new distributions from existing ones. This goal has a nice intersection with PPL (in the common usage), but they’re not the same thing.
@Albert_Zevelev just a heads up that Discourse blocks editing a post after some time. Maybe it would be a good idea to convert it into a GitHub repo, and share the link of the table you are constructing. People could then contribute to it with PRs.
We had Julia.jl as a community effort for a while, but it is not up-to-date unfortunately. Ideally, we would have a community-driven repository of working/maintained packages for different subjects.
I think that “canonical” distributions (more or less those available in Distributions.jl) are special because they have
O(1) IID sampling,
fairly accurate implementations for pdf/cdf/quantiles, where applicable,
in most cases, moments and other properties implemented.
Transformations of random variables are still random variables, but unless the result is another canonical distribution (eg affine transformations of a normal) they usually break 1 or 2, and almost always break 3. Eg MCMC is pretty much about the question of how to proceed in this case.
Personally I don’t see the problem with “fragmentation”. Smaller packages are easier to maintain and manage, and a lot Julia packages pulled this off neatly — eg Tables.jl & friends. I am even pushing for something similar for Distributions:
My thought is to have common infrastructure for generating dependent and independent samples from some law and query known properties of the law of the random numbers and be able to formulate their known algebraic properties.
Distributions is too specialised (almost byzantine) to do this, so MeasureTheory.jl explores a space of something simpler but “with more connectors”, where the richer infrastructure can be build on top of it.
Distributions has for example not a nice way giving the convolution of laws even in the cases where the convolution is known and has a closed form, e.g. GaussianDistributions.jl addresses this.
Think of it as the common denominator of Distributions.jl and RandomExtensions.jl?
It’s useful to have a package with the modest ambition of providing Julia’s answer to the p-, q-, r- functions for the standard distributions, which Distributions.jl currently does more or less well.
When it comes to all the fancy probabilistic programming stuff, is the overarching vision that Julia’s implementation of “the gamma distribution” is ultimately a special case of the same framework that addresses posterior sampling over tree spaces and things like that?
No, the goals are more modest of having something simple which doesn’t tend to go in the way by making a lot of specific traditional assumptions (e.g. all distributions are either float-continues, or integer-discrete, and all samples are either <: Number' or <: Array`). You cannot even nicely represent a Spike and Slab or weighted samples in distributions.This allows to do more fancy things, that is nice of course, but posterior sampling over tree spaces is not the pressing motivation.
Hey look someone found my packages! If there is interest I will update them and make them more accessible. They were just quick implementations I made because I needed the functionality for another project.
Co-authors and I are working (slowly) on a package for multi-variate truncated distributions. Initially multi-variate normal. Also with the ability to fit parameters in cases where desired moments are specified, extending some early univariate idea in this paper. Related (although it doesn’t do moment matching) is this R package, MomTrunc.
Any suggestions or comments on how to best squeeze this in the current eco-system would be greatly appreciated.
Smaller packages are also harder to discover.
Eg Users have searched for SkewNormal w/o knowing about SkewDist .jl
There are far more eyes on bigger packages such as Distributions .jl. When a distribution is added there, other users regularly report bugs & submit PRs w/ bug fixes & improvements.
For example, the Beta was added years ago by one set of users, but was improved/updated w/ various PRs by other users over the years.
Looking at the data above, users seem less likely to try to maintain & submit PRs to small private packages.
We’ve discussed the pros & cons of this on Discourse & elsewhere.
Good luck w/ your decision & I can’t wait to try out your package.