Random variables in Julia (working list)

Albert_Zevelev · October 21, 2020, 5:18pm

Here is my summary of probability distributions in Julia.
This is a Julia version of CRAN Task View: Probability Distributions.

Packages with distributions:

Package	Description	Note
Distributions.jl	fit(), rand(), truncated(), mixture(), convolve(), product_distribution(). 100+ distributions	Maintained. 585 stars. @ johnczito @ mbesancon
StatsFuns.jl	Wraps R. 14 distributions, 10 properties each	Maintained.
Rmath.jl	Wraps R. d-p-q-r	Maintained.
GSL.jl	Wraps C. 38 distributions, 2 properties each rand() & cdf()	Maintained.
SkewDist.jl	fit() & rand(): SkewNormal, SkewTDist, MvSkewNormal, MvSkewTDist	Not registered. 4 years
AlphaStableDistributions.jl	fit() & rand(): AlphaStable, AlphaSubGaussian	Maintained. @baggepinnen
PearsonDistribution.jl	fit(): PearsonI - PearsonVII	Not registered. 2 years. @bdeonovic
GeneralizedLambdaDistribution.jl	fit(): GLD	Not registered. 2 years. @bdeonovic
GKDistribution.jl	fit(): GK	Not registered. 5 years. @bdeonovic
PowerLaws.jl	fit(): continuous/discrete power laws	5 years.
PowerLaw.jl	fit(): continuous/discrete power laws	5 years.
GenInvGaussian.jl	rand(): Generalized Inverse Gaussian	5 years.
MNIG.jl	Multivariate Normal Inverse Gaussian	2 years.
QuadraticFormsMGHyp.jl	cdf(), E(): qfmgh	Maintained. @s-broda
ProjectManagement.jl	rand(): PertBeta	Maintained. @ oxinabox
TweedieDistributions.jl	rand(): Tweedie, CompoundPoissonGamma	Maintained. @jkbest2
ConditionalMvNormals.jl	condition()	Not registered. 3 years. @jkbest2
RandomMatrices.jl	matrix-valued random variables	Not maintained.
RandomMatrixDistributions.jl	matrix-valued random variables	Maintained

Packages for numerical Expectations of functions of random variables:

Package	Description	Note
Distributions.jl	`expectation(dist, g)` use QuadGK.jl, generic	Maintained. 585 stars on Github.
Expectations.jl	Use FastGaussQuadrature.jl, for certain distributions	Maintained
DistQuads.jl	Use FastGaussQuadrature.jl, generic	2 years @ pkofod

Packages for fitting distributions:

Package	Description	Note
FittingDistributions.jl	fit by reducing KL divergence	2 years
GaussianMixtures.jl	fit Gaussian Mixtures w/ EM	Maintained. @ davidavdav
MixtureModels.jl	Finite mixture models	7 years
MixFit.jl	fit mixtures w/ random-swap EM	Maintained
BayesianNonparametrics.jl		2 years
BayesianMixtures.jl	nonparametric Bayesian mixture models	3 years
DPMM.jl		1 year
BIAS.jl		5 years

Packages for working with distributions:

Package	Description	Note
ConditionalDists.jl	Conditional probability distributions powered by Flux.jl and Distributions.jl.	Maintained @vitskvara
AlgebraPDF.jl	Create/fit/sample custom distributions	Maintained @ misha_mikhasenko
DensityRatioEstimation.jl	Estimate the density ratio	Maintained
ZeroInflatedDistributions.jl	Construct & work w/ Zero-inflated distributions	Maintained @jkbest2
ExtremeStats.jl	Fit heavy-tail distributions	Maintained
GeoStats.jl	Spatial distributions	Maintained
MeasureTheory.jl	make any Distribution easily usable as a Measure	Maintained. @ cscherrer
MultivariateMoments.jl	moments of multivariate measures	Maintained. @blegat

PPLs from Chad’s list:

Package	Description	Note
Gen.jl	PPL	1.5k stars
Turing.jl	PPL	1k stars
Stheno.jl	PPL	209 stars @ willtebbutt
Soss.jl	PPL	186 stars @ cscherrer
Stan.jl	Wrapper	154 stars
ForneyLab.jl	PPL	81 stars
Omega.jl	PPL	70 stars @zennatavares
Poirot.jl	PPL	58 stars @MikeInnes
Jaynes.jl	PPL	34 stars

To be categorized:

Package	Description	Note
EmpiricalDistributions.jl	EmpiricalDistributions	Maintained @ oschulz
EmpiricalCDFs.jl		Maintained @ jlapeyre
InterpolatedPDFs.jl		Maintained @m-wells
KDEstimation.jl		Maintained @m-wells
ConjugatePriors.jl		Maintained @ oschulz
CalibrationErrorsDistributions.jl		Maintained @ devmotion
BayesianTools.jl	product() & link()	Not maintained
Divergences.jl	Divergences between two dist	Not maintained
GaussianDistributions.jl		Maintained @ mschauer
BAT.jl	Bayesian analysis toolkit	Maintained
SMC.jl	Sequential Monte Carlo alternative to MH MCMC	Maintained
DynamicHMC.jl		Maintained @ Tamas_Papp
HigherOrderKernels.jl		Maintained

please comment w/ relevant links I missed
note how much more fragmented distributions in R are compared to Julia.
note how much easier it is to work w/ random variables in Julia than R.
See my cheatsheet comparing Julia/Matlab/Base R/STATA.
A PR to port SkewNormal to the maintained Distributions.jl from the unmaintained SkewDist.jl has been merged.
When functionality from unmaintained packages are fully subsumed into well maintained packages, unmaintained packages will be removed from this working list.

juliohm · October 21, 2020, 5:35pm

Extreme distributions: https://github.com/JuliaEarth/ExtremeStats.jl
Spatial distributions: https://github.com/JuliaEarth/GeoStats.jl

juliohm · October 21, 2020, 5:44pm

Measure theory: https://github.com/cscherrer/MeasureTheory.jl

juliohm · October 21, 2020, 5:46pm

Density ratio estimation: GitHub - JuliaML/DensityRatioEstimation.jl: Density ratio estimation in Julia

cscherrer · October 21, 2020, 8:30pm

A Soss.jl model is a distribution over named tuples. We support rand, logpdf, all the usual PPL stuff, and also causal interventions.

I’d say @zennatavares’s Omega.jl is also in this neighborhood.

And @mschauer has GaussianDistributions.jl.

Albert_Zevelev · October 21, 2020, 9:49pm

Thanks Chad.
Originally my Task View list was only going to include packages w/ probability distributions.
->My intention was to ease discovery & reduce fragmentation.

Then I reluctantly made a second list w/ packages for “working” w/ random variables.
Then @juliohm & you gave some additional links. I’m not sure if there should be a separate Task View on PPL in Julia. I’ll include them in the mean time (but I don’t really use PPLs).

cscherrer · October 21, 2020, 10:13pm

Thanks @Albert_Zevelev, I think having a list like this is a really great idea.

In terms of a taxonomy, things are already pretty fuzzy, and will only get fuzzier from here.

The Distributions.jl approach to this is more along the lines of classical statistics. There’s a big collection of “distributions people might want to use”, and the idea is mostly to just grab one and use it. There are just a few combinators like Truncated, but they’re mostly just kind of a bonus feature.

But then there are things like Bijectors.jl from the Turing team (mostly @mohamed82008 , @torfjelde, and @devmotion) and TransformVariables.jl from @Tamas_Papp. These define transforms from a “base distribution”. Whether or not the result is a Distribution, it’s certainly a distribution.

There are other PPLs of course; I mentioned Soss in particular because a Soss model is a distribution. So even if you never do Bayesian inference, you could use it as a handy way to define a distribution over named tuples.

My point is, it’s nice to have a list of commonly-used distributions, but this view is very limited. The real potential is in having flexible ways to create new distributions from existing ones. This goal has a nice intersection with PPL (in the common usage), but they’re not the same thing.

juliohm · October 22, 2020, 1:37am

@Albert_Zevelev just a heads up that Discourse blocks editing a post after some time. Maybe it would be a good idea to convert it into a GitHub repo, and share the link of the table you are constructing. People could then contribute to it with PRs.

We had Julia.jl as a community effort for a while, but it is not up-to-date unfortunately. Ideally, we would have a community-driven repository of working/maintained packages for different subjects.

juliohm · October 22, 2020, 1:39am

In particular, I tried to update the Prob & Stats section some time ago, I think I am the current maintainer: https://github.com/svaksha/Julia.jl/blob/master/Probability-Statistics.md We definitely need updates.

Tamas_Papp · October 22, 2020, 8:51am

I think that “canonical” distributions (more or less those available in Distributions.jl) are special because they have

O(1) IID sampling,
fairly accurate implementations for pdf/cdf/quantiles, where applicable,
in most cases, moments and other properties implemented.

Transformations of random variables are still random variables, but unless the result is another canonical distribution (eg affine transformations of a normal) they usually break 1 or 2, and almost always break 3. Eg MCMC is pretty much about the question of how to proceed in this case.

Personally I don’t see the problem with “fragmentation”. Smaller packages are easier to maintain and manage, and a lot Julia packages pulled this off neatly — eg Tables.jl & friends. I am even pushing for something similar for Distributions:

github.com/JuliaStats/Distributions.jl

RFC: DistributionsBase.jl for packages defining custom distributions

opened 10:15AM - 17 Jun 20 UTC

tpapp

Distributions.jl is a high-quality implementation of many commonly used distribu…tions, benefiting from continuous contributions and peer review from the members of the Julia community. While it is natural to contribute commonly used distributions to this package, some distributions may not be generic enough to warrant this (eg distributions used in a very narrow subfield, invented for a specific application and not in widespread use yet, etc). We should nevertheless make it easy for distributions living in other packages to share the API without incurring the cost of dependencies that are used by Distributions.jl. Of course, distributions from such packages could be migrated to this one later on if necessary. I am proposing that a minimal API is extracted to a small, lightweight package, which could be called DistributionsBase.jl, defining the 1. *functions* `cdf`, `pdf`, ... that operate on Distributions (from [the current export list](https://github.com/JuliaStats/Distributions.jl/blob/20c91d9efcc5f96913bf8e38be2e3fb14b21942b/src/Distributions.jl#L28-L254), 2. a `@reexport_DistributionsBase` macro that reexports these, for use by packages to make it easy to maintain a consistent common exported interface. I am not sure that I would export the type hierarchy in the first pass, as I don't think would be used commonly by packages defining custom distributions. In any case, I would start with the bare minimum, more can be added on demand later. (related: #525)

mschauer · October 22, 2020, 9:03am

My thought is to have common infrastructure for generating dependent and independent samples from some law and query known properties of the law of the random numbers and be able to formulate their known algebraic properties.

Distributions is too specialised (almost byzantine) to do this, so MeasureTheory.jl explores a space of something simpler but “with more connectors”, where the richer infrastructure can be build on top of it.
Distributions has for example not a nice way giving the convolution of laws even in the cases where the convolution is known and has a closed form, e.g. GaussianDistributions.jl addresses this.

Think of it as the common denominator of Distributions.jl and RandomExtensions.jl?

johnczito · October 22, 2020, 10:07am

I always liked these remarks:

It’s useful to have a package with the modest ambition of providing Julia’s answer to the p-, q-, r- functions for the standard distributions, which Distributions.jl currently does more or less well.

When it comes to all the fancy probabilistic programming stuff, is the overarching vision that Julia’s implementation of “the gamma distribution” is ultimately a special case of the same framework that addresses posterior sampling over tree spaces and things like that?

mschauer · October 22, 2020, 10:20am

No, the goals are more modest of having something simple which doesn’t tend to go in the way by making a lot of specific traditional assumptions (e.g. all distributions are either float-continues, or integer-discrete, and all samples are either <: Number' or <: Array`). You cannot even nicely represent a Spike and Slab or weighted samples in distributions.This allows to do more fancy things, that is nice of course, but posterior sampling over tree spaces is not the pressing motivation.

Tamas_Papp · October 22, 2020, 11:41am

Yes, I think it is a very neat package and I am following it with interest.

bdeonovic · October 23, 2020, 1:24pm

Hey look someone found my packages! If there is interest I will update them and make them more accessible. They were just quick implementations I made because I needed the functionality for another project.

bdeonovic · October 23, 2020, 1:25pm

Also…my God where has the time gone…2 years?

Albert_Zevelev · February 12, 2021, 6:36am

A few updates:
Lognormals.jl
MVN CDF: Distributions.jl doesn’t yet have multi-var CDFs
ThorinDistributions.jl

yoninazarathy · February 12, 2021, 11:40am

Co-authors and I are working (slowly) on a package for multi-variate truncated distributions. Initially multi-variate normal. Also with the ability to fit parameters in cases where desired moments are specified, extending some early univariate idea in this paper. Related (although it doesn’t do moment matching) is this R package, MomTrunc.

Any suggestions or comments on how to best squeeze this in the current eco-system would be greatly appreciated.

Tamas_Papp · February 14, 2021, 10:24am

Put the primitives in their own package, under an MIT license, with extensive unit tests, and paclages that want this functionality can just use that.

Albert_Zevelev · February 15, 2021, 12:15am

A pattern we see (at the top of this) is that when separate packages are created for individual distributions, they are less likely to be registered and more likely to stop being maintained when the creators no longer need them.
SkewDist .jl, PearsonDistribution .jl, GeneralizedLambdaDistribution .jl, GKDistribution.jl, PowerLaws .jl, PowerLaw .jl, GenInvGaussian .jl, MNIG .jl, ConditionalMvNormals .jl, RandomMatrices.jl
Smaller packages are also harder to discover.
Eg Users have searched for SkewNormal w/o knowing about SkewDist .jl
There are far more eyes on bigger packages such as Distributions .jl. When a distribution is added there, other users regularly report bugs & submit PRs w/ bug fixes & improvements.
For example, the Beta was added years ago by one set of users, but was improved/updated w/ various PRs by other users over the years.
Looking at the data above, users seem less likely to try to maintain & submit PRs to small private packages.
We’ve discussed the pros & cons of this on Discourse & elsewhere.

Good luck w/ your decision & I can’t wait to try out your package.

Topic		Replies	Views
[ANN] StableDistributions.jl: stable distributions in Julia Package Announcements	8	1334	July 22, 2025
[ANN] NumericalDistributions.jl: user-defined distributions Statistics package , announcement	1	259	April 13, 2025
Distributions.jl extension Statistics distributions	2	502	November 20, 2020
Multivariate noncentral hypergeometric distributions Statistics question , distributions	5	594	May 21, 2021
Distribution of sum of random variables Statistics fftw , distributions	18	2985	February 2, 2023

Random variables in Julia (working list)

Related topics