Random variables in Julia (working list)

I always liked these remarks:

It’s useful to have a package with the modest ambition of providing Julia’s answer to the p-, q-, r- functions for the standard distributions, which Distributions.jl currently does more or less well.

When it comes to all the fancy probabilistic programming stuff, is the overarching vision that Julia’s implementation of “the gamma distribution” is ultimately a special case of the same framework that addresses posterior sampling over tree spaces and things like that?

3 Likes

No, the goals are more modest of having something simple which doesn’t tend to go in the way by making a lot of specific traditional assumptions (e.g. all distributions are either float-continues, or integer-discrete, and all samples are either <: Number' or <: Array`). You cannot even nicely represent a Spike and Slab or weighted samples in distributions.This allows to do more fancy things, that is nice of course, but posterior sampling over tree spaces is not the pressing motivation.

4 Likes

Yes, I think it is a very neat package and I am following it with interest.

3 Likes

Hey look someone found my packages! If there is interest I will update them and make them more accessible. They were just quick implementations I made because I needed the functionality for another project.

5 Likes

Also…my God where has the time gone…2 years?

A few updates:
Lognormals.jl
MVN CDF: Distributions.jl doesn’t yet have multi-var CDFs
ThorinDistributions.jl

1 Like

Co-authors and I are working (slowly) on a package for multi-variate truncated distributions. Initially multi-variate normal. Also with the ability to fit parameters in cases where desired moments are specified, extending some early univariate idea in this paper. Related (although it doesn’t do moment matching) is this R package, MomTrunc.

Any suggestions or comments on how to best squeeze this in the current eco-system would be greatly appreciated.

5 Likes

Put the primitives in their own package, under an MIT license, with extensive unit tests, and paclages that want this functionality can just use that.

4 Likes
  1. A pattern we see (at the top of this) is that when separate packages are created for individual distributions, they are less likely to be registered and more likely to stop being maintained when the creators no longer need them.
    SkewDist .jl, PearsonDistribution .jl, GeneralizedLambdaDistribution .jl, GKDistribution.jl, PowerLaws .jl, PowerLaw .jl, GenInvGaussian .jl, MNIG .jl, ConditionalMvNormals .jl, RandomMatrices.jl

  2. Smaller packages are also harder to discover.
    Eg Users have searched for SkewNormal w/o knowing about SkewDist .jl

  3. There are far more eyes on bigger packages such as Distributions .jl. When a distribution is added there, other users regularly report bugs & submit PRs w/ bug fixes & improvements.
    For example, the Beta was added years ago by one set of users, but was improved/updated w/ various PRs by other users over the years.
    Looking at the data above, users seem less likely to try to maintain & submit PRs to small private packages.

  4. We’ve discussed the pros & cons of this on Discourse & elsewhere.

Good luck w/ your decision & I can’t wait to try out your package.

1 Like

Thanks for noticing ThorinDistributions.jl ! However this is not yet usable and still very unstable project. Someday it might get better though :wink:

2 Likes

:disappointed_relieved: I have started working on a un-published package on Random Matrix a few weeks ago.
The goal is to gradually improve it in the next few months as I learned more about Julia while working on this project. And hopefully it will be something worth release at the end of the summer.

https://github.com/weiyang2048/RandomMatrix.jl

1 Like

LRMoE.jl has several Zero Inflated random variables along w/ Burr and GammaCount.
He has a PR to add Burr to Distributions.jl

@tamasgal
https://github.com/JuliaHEP/LandauDistribution.jl

1 Like

Thanks @Albert_Zevelev! I am already using it and also contributed :) it will be released today as a Julia package.

2 Likes

Another cheatset to compare basic distribution usage with R and Python:
https://github.com/sylvaticus/commonDistributionsInJuliaPythonR

2 Likes

See mine wip:

@mlkrock added a repo w/ a 7-parameter distribution

One of the nice features of Distributions.jl is the ability to create new transformed distributions from existing distributions.

  • MixtureModel([Normal(0,1),Cauchy(0,1)], [0.5,0.5]) returns a new random variable

  • Truncated(Cauchy(0,1), 0.25, 1.8)

  • convolve(Cauchy(0,1), Cauchy(5,2))

A recent PR proposes folded distributions.
This is cool b/c it automatically allows the user to access a large number of distributions:
folded-Cauchy/folded-normal/Half-Cauchy/half-logistic/half-normal etc

There has been discussion about a generic ZeroInflated distribution here, here, here

Are there other important transformations of random variables not considered yet?
Maybe CensoredDistribution, Conditioned & Derived Statistical Distributions can provide some inspiration?

2 Likes

The way Julia handles this stuff is just miles ahead of other languages thanks to first class structs etc. In R you would have to write all the rfoo,pfoo,dfoo functions even if they are trivially derived from something else. In Stan you have to write your own logpdf functions etc as well, the ability to just say stuff like convolve(A,B) is truly fabulous.

I should probably include distributions in my tutorial vignettes I’m working on.

4 Likes

I think a good way to increase confidence in the correctness of our ecosystem is to implement more/better tests of systemically important packages such as Distributions.jl.

E.g.
Popoviciu’s inequality: for any bounded univariate random variable X \in [m, M] we have \sigma^{2} \leq \frac{1}{4}(M-m)^2

Maybe some kind of loop over all uni distributions in the pkg, that checks if the RV is bounded & if various inequalities hold?

1 Like