Random variables in Julia (working list)

Hey look someone found my packages! If there is interest I will update them and make them more accessible. They were just quick implementations I made because I needed the functionality for another project.

5 Likes

Also…my God where has the time gone…2 years?

A few updates:
Lognormals.jl
MVN CDF: Distributions.jl doesn’t yet have multi-var CDFs
ThorinDistributions.jl

1 Like

Co-authors and I are working (slowly) on a package for multi-variate truncated distributions. Initially multi-variate normal. Also with the ability to fit parameters in cases where desired moments are specified, extending some early univariate idea in this paper. Related (although it doesn’t do moment matching) is this R package, MomTrunc.

Any suggestions or comments on how to best squeeze this in the current eco-system would be greatly appreciated.

5 Likes

Put the primitives in their own package, under an MIT license, with extensive unit tests, and paclages that want this functionality can just use that.

4 Likes
  1. A pattern we see (at the top of this) is that when separate packages are created for individual distributions, they are less likely to be registered and more likely to stop being maintained when the creators no longer need them.
    SkewDist .jl, PearsonDistribution .jl, GeneralizedLambdaDistribution .jl, GKDistribution.jl, PowerLaws .jl, PowerLaw .jl, GenInvGaussian .jl, MNIG .jl, ConditionalMvNormals .jl, RandomMatrices.jl

  2. Smaller packages are also harder to discover.
    Eg Users have searched for SkewNormal w/o knowing about SkewDist .jl

  3. There are far more eyes on bigger packages such as Distributions .jl. When a distribution is added there, other users regularly report bugs & submit PRs w/ bug fixes & improvements.
    For example, the Beta was added years ago by one set of users, but was improved/updated w/ various PRs by other users over the years.
    Looking at the data above, users seem less likely to try to maintain & submit PRs to small private packages.

  4. We’ve discussed the pros & cons of this on Discourse & elsewhere.

Good luck w/ your decision & I can’t wait to try out your package.

1 Like

Thanks for noticing ThorinDistributions.jl ! However this is not yet usable and still very unstable project. Someday it might get better though :wink:

2 Likes

:disappointed_relieved: I have started working on a un-published package on Random Matrix a few weeks ago.
The goal is to gradually improve it in the next few months as I learned more about Julia while working on this project. And hopefully it will be something worth release at the end of the summer.

https://github.com/weiyang2048/RandomMatrix.jl

1 Like

LRMoE.jl has several Zero Inflated random variables along w/ Burr and GammaCount.
He has a PR to add Burr to Distributions.jl

@tamasgal
https://github.com/JuliaHEP/LandauDistribution.jl

1 Like

Thanks @Albert_Zevelev! I am already using it and also contributed :) it will be released today as a Julia package.

2 Likes

Another cheatset to compare basic distribution usage with R and Python:
https://github.com/sylvaticus/commonDistributionsInJuliaPythonR

2 Likes

See mine wip:

@mlkrock added a repo w/ a 7-parameter distribution

One of the nice features of Distributions.jl is the ability to create new transformed distributions from existing distributions.

  • MixtureModel([Normal(0,1),Cauchy(0,1)], [0.5,0.5]) returns a new random variable

  • Truncated(Cauchy(0,1), 0.25, 1.8)

  • convolve(Cauchy(0,1), Cauchy(5,2))

A recent PR proposes folded distributions.
This is cool b/c it automatically allows the user to access a large number of distributions:
folded-Cauchy/folded-normal/Half-Cauchy/half-logistic/half-normal etc

There has been discussion about a generic ZeroInflated distribution here, here, here

Are there other important transformations of random variables not considered yet?
Maybe CensoredDistribution, Conditioned & Derived Statistical Distributions can provide some inspiration?

2 Likes

The way Julia handles this stuff is just miles ahead of other languages thanks to first class structs etc. In R you would have to write all the rfoo,pfoo,dfoo functions even if they are trivially derived from something else. In Stan you have to write your own logpdf functions etc as well, the ability to just say stuff like convolve(A,B) is truly fabulous.

I should probably include distributions in my tutorial vignettes I’m working on.

4 Likes

I think a good way to increase confidence in the correctness of our ecosystem is to implement more/better tests of systemically important packages such as Distributions.jl.

E.g.
Popoviciu’s inequality: for any bounded univariate random variable X \in [m, M] we have \sigma^{2} \leq \frac{1}{4}(M-m)^2

Maybe some kind of loop over all uni distributions in the pkg, that checks if the RV is bounded & if various inequalities hold?

1 Like

What does this mean? That (for large n) rand(dist, n) and rand(dist, 2n) take the same amount of time?

Statistics! I often use t-test etc. to test deviations of a random variable from it’s known mean in unit testing, but it’s a bit difficult to get unit-testing working nicely with test which may fail, just not too often.
Therefore I like your @Albert_Zevelev example with Popoviciu because it makes a deterministic test which has a bit of slack but on the other hand never should fail. Hoeffding inequality also comes to mind or other finite-sample properties.

2 Likes

No, of course not. O(1) is per IID sample.

The distinction is from situations where obtaining IID samples is practically impossible or very expensive, and you have to resort to MCMC. Practically, efficient IID sampling methods exist for all univariate distributions, but for multivariate distributions cheapo IID sampling is only possible in a few special cases.