Random variables in Julia (working list)

Here is my summary of probability distributions in Julia.
This is a Julia version of CRAN Task View: Probability Distributions.

Packages with distributions:

Package Description Note
Distributions.jl fit(), rand(), truncated(), mixture(), convolve(), product_distribution(). 100+ distributions Maintained. 585 stars. @ johnczito @ mbesancon
StatsFuns.jl Wraps R. 14 distributions, 10 properties each Maintained.
Rmath.jl Wraps R. d-p-q-r Maintained.
GSL.jl Wraps C. 38 distributions, 2 properties each rand() & cdf() Maintained.
SkewDist.jl fit() & rand(): SkewNormal, SkewTDist, MvSkewNormal, MvSkewTDist Not registered. 4 years
AlphaStableDistributions.jl fit() & rand(): AlphaStable, AlphaSubGaussian Maintained. @baggepinnen
PearsonDistribution.jl fit(): PearsonI - PearsonVII Not registered. 2 years. @bdeonovic
GeneralizedLambdaDistribution.jl fit(): GLD Not registered. 2 years. @bdeonovic
GKDistribution.jl fit(): GK Not registered. 5 years. @bdeonovic
PowerLaws.jl fit(): continuous/discrete power laws 5 years.
PowerLaw.jl fit(): continuous/discrete power laws 5 years.
GenInvGaussian.jl rand(): Generalized Inverse Gaussian 5 years.
MNIG.jl Multivariate Normal Inverse Gaussian 2 years.
QuadraticFormsMGHyp.jl cdf(), E(): qfmgh Maintained. @s-broda
ProjectManagement.jl rand(): PertBeta Maintained. @ oxinabox
TweedieDistributions.jl rand(): Tweedie, CompoundPoissonGamma Maintained. @jkbest2
ConditionalMvNormals.jl condition() Not registered. 3 years. @jkbest2
RandomMatrices.jl matrix-valued random variables Not maintained.
RandomMatrixDistributions.jl matrix-valued random variables Maintained

Packages for numerical Expectations of functions of random variables:

Package Description Note
Distributions.jl expectation(dist, g) use QuadGK.jl, generic Maintained. 585 stars on Github.
Expectations.jl Use FastGaussQuadrature.jl, for certain distributions Maintained
DistQuads.jl Use FastGaussQuadrature.jl, generic 2 years @ pkofod

Packages for fitting distributions:

Package Description Note
FittingDistributions.jl fit by reducing KL divergence 2 years
GaussianMixtures.jl fit Gaussian Mixtures w/ EM Maintained. @ davidavdav
MixtureModels.jl Finite mixture models 7 years
MixFit.jl fit mixtures w/ random-swap EM Maintained
BayesianNonparametrics.jl 2 years
BayesianMixtures.jl nonparametric Bayesian mixture models 3 years
DPMM.jl 1 year
BIAS.jl 5 years

Packages for working with distributions:

Package Description Note
ConditionalDists.jl Conditional probability distributions powered by Flux.jl and Distributions.jl. Maintained @vitskvara
AlgebraPDF.jl Create/fit/sample custom distributions Maintained @ misha_mikhasenko
DensityRatioEstimation.jl Estimate the density ratio Maintained
ZeroInflatedDistributions.jl Construct & work w/ Zero-inflated distributions Maintained @jkbest2
ExtremeStats.jl Fit heavy-tail distributions Maintained
GeoStats.jl Spatial distributions Maintained
MeasureTheory.jl make any Distribution easily usable as a Measure Maintained. @ cscherrer
MultivariateMoments.jl moments of multivariate measures Maintained. @blegat

PPLs from Chad’s list:

Package Description Note
Gen.jl PPL 1.5k stars
Turing.jl PPL 1k stars
Stheno.jl PPL 209 stars @ willtebbutt
Soss.jl PPL 186 stars @ cscherrer
Stan.jl Wrapper 154 stars
ForneyLab.jl PPL 81 stars
Omega.jl PPL 70 stars @zennatavares
Poirot.jl PPL 58 stars @MikeInnes
Jaynes.jl PPL 34 stars

To be categorized:

Package Description Note
EmpiricalDistributions.jl EmpiricalDistributions Maintained @ oschulz
EmpiricalCDFs.jl Maintained @ jlapeyre
InterpolatedPDFs.jl Maintained @m-wells
KDEstimation.jl Maintained @m-wells
ConjugatePriors.jl Maintained @ oschulz
CalibrationErrorsDistributions.jl Maintained @ devmotion
BayesianTools.jl product() & link() Not maintained
Divergences.jl Divergences between two dist Not maintained
GaussianDistributions.jl Maintained @ mschauer
BAT.jl Bayesian analysis toolkit Maintained
SMC.jl Sequential Monte Carlo alternative to MH MCMC Maintained
DynamicHMC.jl Maintained @ Tamas_Papp
HigherOrderKernels.jl Maintained
  1. please comment w/ relevant links I missed
  2. note how much more fragmented distributions in R are compared to Julia.
  3. note how much easier it is to work w/ random variables in Julia than R.
    See my cheatsheet comparing Julia/Matlab/Base R/STATA.
  4. A PR to port SkewNormal to the maintained Distributions.jl from the unmaintained SkewDist.jl has been merged.
    When functionality from unmaintained packages are fully subsumed into well maintained packages, unmaintained packages will be removed from this working list.
35 Likes

Extreme distributions: https://github.com/JuliaEarth/ExtremeStats.jl
Spatial distributions: https://github.com/JuliaEarth/GeoStats.jl

2 Likes

Measure theory: https://github.com/cscherrer/MeasureTheory.jl

3 Likes

Density ratio estimation: GitHub - JuliaML/DensityRatioEstimation.jl: Density ratio estimation in Julia

2 Likes

A Soss.jl model is a distribution over named tuples. We support rand, logpdf, all the usual PPL stuff, and also causal interventions.

I’d say @zennatavares’s Omega.jl is also in this neighborhood.

And @mschauer has GaussianDistributions.jl.

4 Likes

Thanks Chad.
Originally my Task View list was only going to include packages w/ probability distributions.
->My intention was to ease discovery & reduce fragmentation.

Then I reluctantly made a second list w/ packages for “working” w/ random variables.
Then @juliohm & you gave some additional links. I’m not sure if there should be a separate Task View on PPL in Julia. I’ll include them in the mean time (but I don’t really use PPLs).

2 Likes

Thanks @Albert_Zevelev, I think having a list like this is a really great idea.

In terms of a taxonomy, things are already pretty fuzzy, and will only get fuzzier from here.

The Distributions.jl approach to this is more along the lines of classical statistics. There’s a big collection of “distributions people might want to use”, and the idea is mostly to just grab one and use it. There are just a few combinators like Truncated, but they’re mostly just kind of a bonus feature.

But then there are things like Bijectors.jl from the Turing team (mostly @mohamed82008 , @torfjelde, and @devmotion) and TransformVariables.jl from @Tamas_Papp. These define transforms from a “base distribution”. Whether or not the result is a Distribution, it’s certainly a distribution.

There are other PPLs of course; I mentioned Soss in particular because a Soss model is a distribution. So even if you never do Bayesian inference, you could use it as a handy way to define a distribution over named tuples.

My point is, it’s nice to have a list of commonly-used distributions, but this view is very limited. The real potential is in having flexible ways to create new distributions from existing ones. This goal has a nice intersection with PPL (in the common usage), but they’re not the same thing.

8 Likes

@Albert_Zevelev just a heads up that Discourse blocks editing a post after some time. Maybe it would be a good idea to convert it into a GitHub repo, and share the link of the table you are constructing. People could then contribute to it with PRs.

We had Julia.jl as a community effort for a while, but it is not up-to-date unfortunately. Ideally, we would have a community-driven repository of working/maintained packages for different subjects.

4 Likes

In particular, I tried to update the Prob & Stats section some time ago, I think I am the current maintainer: https://github.com/svaksha/Julia.jl/blob/master/Probability-Statistics.md We definitely need updates.

3 Likes

I think that “canonical” distributions (more or less those available in Distributions.jl) are special because they have

  1. O(1) IID sampling,
  2. fairly accurate implementations for pdf/cdf/quantiles, where applicable,
  3. in most cases, moments and other properties implemented.

Transformations of random variables are still random variables, but unless the result is another canonical distribution (eg affine transformations of a normal) they usually break 1 or 2, and almost always break 3. Eg MCMC is pretty much about the question of how to proceed in this case.

Personally I don’t see the problem with “fragmentation”. Smaller packages are easier to maintain and manage, and a lot Julia packages pulled this off neatly — eg Tables.jl & friends. I am even pushing for something similar for Distributions:

5 Likes

My thought is to have common infrastructure for generating dependent and independent samples from some law and query known properties of the law of the random numbers and be able to formulate their known algebraic properties.

Distributions is too specialised (almost byzantine) to do this, so MeasureTheory.jl explores a space of something simpler but “with more connectors”, where the richer infrastructure can be build on top of it.
Distributions has for example not a nice way giving the convolution of laws even in the cases where the convolution is known and has a closed form, e.g. GaussianDistributions.jl addresses this.

Think of it as the common denominator of Distributions.jl and RandomExtensions.jl?

3 Likes

I always liked these remarks:

It’s useful to have a package with the modest ambition of providing Julia’s answer to the p-, q-, r- functions for the standard distributions, which Distributions.jl currently does more or less well.

When it comes to all the fancy probabilistic programming stuff, is the overarching vision that Julia’s implementation of “the gamma distribution” is ultimately a special case of the same framework that addresses posterior sampling over tree spaces and things like that?

3 Likes

No, the goals are more modest of having something simple which doesn’t tend to go in the way by making a lot of specific traditional assumptions (e.g. all distributions are either float-continues, or integer-discrete, and all samples are either <: Number' or <: Array`). You cannot even nicely represent a Spike and Slab or weighted samples in distributions.This allows to do more fancy things, that is nice of course, but posterior sampling over tree spaces is not the pressing motivation.

4 Likes

Yes, I think it is a very neat package and I am following it with interest.

3 Likes

Hey look someone found my packages! If there is interest I will update them and make them more accessible. They were just quick implementations I made because I needed the functionality for another project.

5 Likes

Also…my God where has the time gone…2 years?

A few updates:
Lognormals.jl
MVN CDF: Distributions.jl doesn’t yet have multi-var CDFs
ThorinDistributions.jl

1 Like

Co-authors and I are working (slowly) on a package for multi-variate truncated distributions. Initially multi-variate normal. Also with the ability to fit parameters in cases where desired moments are specified, extending some early univariate idea in this paper. Related (although it doesn’t do moment matching) is this R package, MomTrunc.

Any suggestions or comments on how to best squeeze this in the current eco-system would be greatly appreciated.

5 Likes

Put the primitives in their own package, under an MIT license, with extensive unit tests, and paclages that want this functionality can just use that.

4 Likes
  1. A pattern we see (at the top of this) is that when separate packages are created for individual distributions, they are less likely to be registered and more likely to stop being maintained when the creators no longer need them.
    SkewDist .jl, PearsonDistribution .jl, GeneralizedLambdaDistribution .jl, GKDistribution.jl, PowerLaws .jl, PowerLaw .jl, GenInvGaussian .jl, MNIG .jl, ConditionalMvNormals .jl, RandomMatrices.jl

  2. Smaller packages are also harder to discover.
    Eg Users have searched for SkewNormal w/o knowing about SkewDist .jl

  3. There are far more eyes on bigger packages such as Distributions .jl. When a distribution is added there, other users regularly report bugs & submit PRs w/ bug fixes & improvements.
    For example, the Beta was added years ago by one set of users, but was improved/updated w/ various PRs by other users over the years.
    Looking at the data above, users seem less likely to try to maintain & submit PRs to small private packages.

  4. We’ve discussed the pros & cons of this on Discourse & elsewhere.

Good luck w/ your decision & I can’t wait to try out your package.

1 Like