Random variables in Julia (working list)

Thanks for noticing ThorinDistributions.jl ! However this is not yet usable and still very unstable project. Someday it might get better though :wink:

2 Likes

:disappointed_relieved: I have started working on a un-published package on Random Matrix a few weeks ago.
The goal is to gradually improve it in the next few months as I learned more about Julia while working on this project. And hopefully it will be something worth release at the end of the summer.

https://github.com/weiyang2048/RandomMatrix.jl

1 Like

LRMoE.jl has several Zero Inflated random variables along w/ Burr and GammaCount.
He has a PR to add Burr to Distributions.jl

@tamasgal
https://github.com/JuliaHEP/LandauDistribution.jl

1 Like

Thanks @Albert_Zevelev! I am already using it and also contributed :) it will be released today as a Julia package.

2 Likes

Another cheatset to compare basic distribution usage with R and Python:
https://github.com/sylvaticus/commonDistributionsInJuliaPythonR

2 Likes

See mine wip:

@mlkrock added a repo w/ a 7-parameter distribution

One of the nice features of Distributions.jl is the ability to create new transformed distributions from existing distributions.

  • MixtureModel([Normal(0,1),Cauchy(0,1)], [0.5,0.5]) returns a new random variable

  • Truncated(Cauchy(0,1), 0.25, 1.8)

  • convolve(Cauchy(0,1), Cauchy(5,2))

A recent PR proposes folded distributions.
This is cool b/c it automatically allows the user to access a large number of distributions:
folded-Cauchy/folded-normal/Half-Cauchy/half-logistic/half-normal etc

There has been discussion about a generic ZeroInflated distribution here, here, here

Are there other important transformations of random variables not considered yet?
Maybe CensoredDistribution, Conditioned & Derived Statistical Distributions can provide some inspiration?

2 Likes

The way Julia handles this stuff is just miles ahead of other languages thanks to first class structs etc. In R you would have to write all the rfoo,pfoo,dfoo functions even if they are trivially derived from something else. In Stan you have to write your own logpdf functions etc as well, the ability to just say stuff like convolve(A,B) is truly fabulous.

I should probably include distributions in my tutorial vignettes Iā€™m working on.

4 Likes

I think a good way to increase confidence in the correctness of our ecosystem is to implement more/better tests of systemically important packages such as Distributions.jl.

E.g.
Popoviciuā€™s inequality: for any bounded univariate random variable X \in [m, M] we have \sigma^{2} \leq \frac{1}{4}(M-m)^2

Maybe some kind of loop over all uni distributions in the pkg, that checks if the RV is bounded & if various inequalities hold?

1 Like

What does this mean? That (for large n) rand(dist, n) and rand(dist, 2n) take the same amount of time?

Statistics! I often use t-test etc. to test deviations of a random variable from itā€™s known mean in unit testing, but itā€™s a bit difficult to get unit-testing working nicely with test which may fail, just not too often.
Therefore I like your @Albert_Zevelev example with Popoviciu because it makes a deterministic test which has a bit of slack but on the other hand never should fail. Hoeffding inequality also comes to mind or other finite-sample properties.

2 Likes

No, of course not. O(1) is per IID sample.

The distinction is from situations where obtaining IID samples is practically impossible or very expensive, and you have to resort to MCMC. Practically, efficient IID sampling methods exist for all univariate distributions, but for multivariate distributions cheapo IID sampling is only possible in a few special cases.

Iā€™m confused by the use of big-O notation here. In my understanding, big-O is used to express that the computation time depends on some parameter of the problem, most often its ā€œsize.ā€ For example, if you were to generate a sample from the binomial(p, n) distribution by calling sum(rand() < p for _ in 1:n), then this is an O(n) algorithm because its computation time grows linearly in n.

If your random variable is defined by a complicated stochastic process, for example X = f(Y, u) where Y is the RNG state and u is a vector of parameters, then it may take a long time to compute f, but this sampling time is typically constant in u, no?

I know itā€™s a little off topic, but if itā€™s not too much to ask, Iā€™d appreciate an example.

1 Like

Many of these special cases corresponds to copula models, for which standard sampling tools (and more!) are now available in Copulas.jl. Disclaimer : Iā€™m the main author of the package :slight_smile:

1 Like

It is not meant literally, just ignore it and focus on the practical part (which is, again: for some distributions, you can draw IID samples cheaply, but generally you cannot).