Random variables in Julia (working list)

lrnv · April 1, 2021, 9:43pm

Thanks for noticing ThorinDistributions.jl ! However this is not yet usable and still very unstable project. Someday it might get better though

Wei_Yang · April 27, 2021, 4:23am

I have started working on a un-published package on Random Matrix a few weeks ago.
The goal is to gradually improve it in the next few months as I learned more about Julia while working on this project. And hopefully it will be something worth release at the end of the summer.

https://github.com/weiyang2048/RandomMatrix.jl

Albert_Zevelev · May 12, 2021, 5:55pm

LRMoE.jl has several Zero Inflated random variables along w/ Burr and GammaCount.
He has a PR to add Burr to Distributions.jl

Albert_Zevelev · September 19, 2021, 1:11pm

@tamasgal
https://github.com/JuliaHEP/LandauDistribution.jl

tamasgal · September 19, 2021, 1:19pm

Thanks @Albert_Zevelev! I am already using it and also contributed :) it will be released today as a Julia package.

sylvaticus · September 19, 2021, 4:15pm

Another cheatset to compare basic distribution usage with R and Python:
https://github.com/sylvaticus/commonDistributionsInJuliaPythonR

Albert_Zevelev · September 19, 2021, 4:25pm

See mine wip:

Albert_Zevelev · December 1, 2021, 3:43am

@mlkrock added a repo w/ a 7-parameter distribution

Albert_Zevelev · November 8, 2022, 1:34am

One of the nice features of Distributions.jl is the ability to create new transformed distributions from existing distributions.

MixtureModel([Normal(0,1),Cauchy(0,1)], [0.5,0.5]) returns a new random variable
Truncated(Cauchy(0,1), 0.25, 1.8)
convolve(Cauchy(0,1), Cauchy(5,2))

A recent PR proposes folded distributions.
This is cool b/c it automatically allows the user to access a large number of distributions:
folded-Cauchy/folded-normal/Half-Cauchy/half-logistic/half-normal etc

There has been discussion about a generic ZeroInflated distribution here, here, here

Are there other important transformations of random variables not considered yet?
Maybe CensoredDistribution, Conditioned & Derived Statistical Distributions can provide some inspiration?

dlakelan · November 8, 2022, 1:53am

The way Julia handles this stuff is just miles ahead of other languages thanks to first class structs etc. In R you would have to write all the rfoo,pfoo,dfoo functions even if they are trivially derived from something else. In Stan you have to write your own logpdf functions etc as well, the ability to just say stuff like convolve(A,B) is truly fabulous.

I should probably include distributions in my tutorial vignettes I’m working on.

Albert_Zevelev · November 23, 2022, 5:56pm

I think a good way to increase confidence in the correctness of our ecosystem is to implement more/better tests of systemically important packages such as Distributions.jl.

E.g.
Popoviciu’s inequality: for any bounded univariate random variable X \in [m, M] we have \sigma^{2} \leq \frac{1}{4}(M-m)^2

Maybe some kind of loop over all uni distributions in the pkg, that checks if the RV is bounded & if various inequalities hold?

maxkapur · November 23, 2022, 7:56pm

What does this mean? That (for large n) rand(dist, n) and rand(dist, 2n) take the same amount of time?

mschauer · November 24, 2022, 9:09am

Statistics! I often use t-test etc. to test deviations of a random variable from it’s known mean in unit testing, but it’s a bit difficult to get unit-testing working nicely with test which may fail, just not too often.
Therefore I like your @Albert_Zevelev example with Popoviciu because it makes a deterministic test which has a bit of slack but on the other hand never should fail. Hoeffding inequality also comes to mind or other finite-sample properties.

Tamas_Papp · November 24, 2022, 9:42am

No, of course not. O(1) is per IID sample.

The distinction is from situations where obtaining IID samples is practically impossible or very expensive, and you have to resort to MCMC. Practically, efficient IID sampling methods exist for all univariate distributions, but for multivariate distributions cheapo IID sampling is only possible in a few special cases.

maxkapur · November 25, 2022, 8:01pm

I’m confused by the use of big-O notation here. In my understanding, big-O is used to express that the computation time depends on some parameter of the problem, most often its “size.” For example, if you were to generate a sample from the binomial(p, n) distribution by calling sum(rand() < p for _ in 1:n), then this is an O(n) algorithm because its computation time grows linearly in n.

If your random variable is defined by a complicated stochastic process, for example X = f(Y, u) where Y is the RNG state and u is a vector of parameters, then it may take a long time to compute f, but this sampling time is typically constant in u, no?

I know it’s a little off topic, but if it’s not too much to ask, I’d appreciate an example.

lrnv · November 25, 2022, 10:08pm

Many of these special cases corresponds to copula models, for which standard sampling tools (and more!) are now available in Copulas.jl. Disclaimer : I’m the main author of the package

Tamas_Papp · November 27, 2022, 8:49am

It is not meant literally, just ignore it and focus on the practical part (which is, again: for some distributions, you can draw IID samples cheaply, but generally you cannot).

Topic		Replies	Views
[ANN] StableDistributions.jl: stable distributions in Julia Package Announcements	7	1239	October 22, 2024
[ANN] NumericalDistributions.jl: user-defined distributions Statistics package , announcement	1	251	April 13, 2025
Distributions.jl extension Statistics distributions	2	502	November 20, 2020
Multivariate noncentral hypergeometric distributions Statistics question , distributions	5	591	May 21, 2021
Distribution of sum of random variables Statistics fftw , distributions	18	2978	February 2, 2023

Random variables in Julia (working list)

Related topics