Random variables in Julia (working list)

Extreme distributions: https://github.com/JuliaEarth/ExtremeStats.jl
Spatial distributions: https://github.com/JuliaEarth/GeoStats.jl

1 Like

Measure theory: https://github.com/cscherrer/MeasureTheory.jl

2 Likes

Density ratio estimation: https://github.com/JuliaEarth/DensityRatioEstimation.jl

1 Like

A Soss.jl model is a distribution over named tuples. We support rand, logpdf, all the usual PPL stuff, and also causal interventions.

I’d say @zennatavares’s Omega.jl is also in this neighborhood.

And @mschauer has GaussianDistributions.jl.

3 Likes

Thanks Chad.
Originally my Task View list was only going to include packages w/ probability distributions.
->My intention was to ease discovery & reduce fragmentation.

Then I reluctantly made a second list w/ packages for “working” w/ random variables.
Then @juliohm & you gave some additional links. I’m not sure if there should be a separate Task View on PPL in Julia. I’ll include them in the mean time (but I don’t really use PPLs).

1 Like

Thanks @Albert_Zevelev, I think having a list like this is a really great idea.

In terms of a taxonomy, things are already pretty fuzzy, and will only get fuzzier from here.

The Distributions.jl approach to this is more along the lines of classical statistics. There’s a big collection of “distributions people might want to use”, and the idea is mostly to just grab one and use it. There are just a few combinators like Truncated, but they’re mostly just kind of a bonus feature.

But then there are things like Bijectors.jl from the Turing team (mostly @mohamed82008 , @torfjelde, and @devmotion) and TransformVariables.jl from @Tamas_Papp. These define transforms from a “base distribution”. Whether or not the result is a Distribution, it’s certainly a distribution.

There are other PPLs of course; I mentioned Soss in particular because a Soss model is a distribution. So even if you never do Bayesian inference, you could use it as a handy way to define a distribution over named tuples.

My point is, it’s nice to have a list of commonly-used distributions, but this view is very limited. The real potential is in having flexible ways to create new distributions from existing ones. This goal has a nice intersection with PPL (in the common usage), but they’re not the same thing.

6 Likes

@Albert_Zevelev just a heads up that Discourse blocks editing a post after some time. Maybe it would be a good idea to convert it into a GitHub repo, and share the link of the table you are constructing. People could then contribute to it with PRs.

We had Julia.jl as a community effort for a while, but it is not up-to-date unfortunately. Ideally, we would have a community-driven repository of working/maintained packages for different subjects.

3 Likes

In particular, I tried to update the Prob & Stats section some time ago, I think I am the current maintainer: https://github.com/svaksha/Julia.jl/blob/master/Probability-Statistics.md We definitely need updates.

2 Likes

I think that “canonical” distributions (more or less those available in Distributions.jl) are special because they have

  1. O(1) IID sampling,
  2. fairly accurate implementations for pdf/cdf/quantiles, where applicable,
  3. in most cases, moments and other properties implemented.

Transformations of random variables are still random variables, but unless the result is another canonical distribution (eg affine transformations of a normal) they usually break 1 or 2, and almost always break 3. Eg MCMC is pretty much about the question of how to proceed in this case.

Personally I don’t see the problem with “fragmentation”. Smaller packages are easier to maintain and manage, and a lot Julia packages pulled this off neatly — eg Tables.jl & friends. I am even pushing for something similar for Distributions:

4 Likes

My thought is to have common infrastructure for generating dependent and independent samples from some law and query known properties of the law of the random numbers and be able to formulate their known algebraic properties.

Distributions is too specialised (almost byzantine) to do this, so MeasureTheory.jl explores a space of something simpler but “with more connectors”, where the richer infrastructure can be build on top of it.
Distributions has for example not a nice way giving the convolution of laws even in the cases where the convolution is known and has a closed form, e.g. GaussianDistributions.jl addresses this.

Think of it as the common denominator of Distributions.jl and RandomExtensions.jl?

2 Likes

I always liked these remarks:

It’s useful to have a package with the modest ambition of providing Julia's answer to the p-, q-, r- functions for the standard distributions, which Distributions.jl currently does more or less well.

When it comes to all the fancy probabilistic programming stuff, is the overarching vision that Julia's implementation of “the gamma distribution” is ultimately a special case of the same framework that addresses posterior sampling over tree spaces and things like that?

2 Likes

No, the goals are more modest of having something simple which doesn’t tend to go in the way by making a lot of specific traditional assumptions (e.g. all distributions are either float-continues, or integer-discrete, and all samples are either <: Number' or <: Array`). You cannot even nicely represent a Spike and Slab or weighted samples in distributions.This allows to do more fancy things, that is nice of course, but posterior sampling over tree spaces is not the pressing motivation.

3 Likes

Yes, I think it is a very neat package and I am following it with interest.

2 Likes

Hey look someone found my packages! If there is interest I will update them and make them more accessible. They were just quick implementations I made because I needed the functionality for another project.

4 Likes

Also…my God where has the time gone…2 years?

A few updates:
Lognormals.jl
MVN CDF: Distributions.jl doesn’t yet have multi-var CDFs
ThorinDistributions.jl

Co-authors and I are working (slowly) on a package for multi-variate truncated distributions. Initially multi-variate normal. Also with the ability to fit parameters in cases where desired moments are specified, extending some early univariate idea in this paper. Related (although it doesn’t do moment matching) is this R package, MomTrunc.

Any suggestions or comments on how to best squeeze this in the current eco-system would be greatly appreciated.

4 Likes

Put the primitives in their own package, under an MIT license, with extensive unit tests, and paclages that want this functionality can just use that.

3 Likes
  1. A pattern we see (at the top of this) is that when separate packages are created for individual distributions, they are less likely to be registered and more likely to stop being maintained when the creators no longer need them.
    SkewDist .jl, PearsonDistribution .jl, GeneralizedLambdaDistribution .jl, GKDistribution.jl, PowerLaws .jl, PowerLaw .jl, GenInvGaussian .jl, MNIG .jl, ConditionalMvNormals .jl, RandomMatrices.jl

  2. Smaller packages are also harder to discover.
    Eg Users have searched for SkewNormal w/o knowing about SkewDist .jl

  3. There are far more eyes on bigger packages such as Distributions .jl. When a distribution is added there, other users regularly report bugs & submit PRs w/ bug fixes & improvements.
    For example, the Beta was added years ago by one set of users, but was improved/updated w/ various PRs by other users over the years.
    Looking at the data above, users seem less likely to try to maintain & submit PRs to small private packages.

  4. We’ve discussed the pros & cons of this on Discourse & elsewhere.

Good luck w/ your decision & I can’t wait to try out your package.

1 Like

Thanks for noticing ThorinDistributions.jl ! However this is not yet usable and still very unstable project. Someday it might get better though :wink:

2 Likes