Thanks for noticing ThorinDistributions.jl ! However this is not yet usable and still very unstable project. Someday it might get better though
I have started working on a unpublished package on Random Matrix a few weeks ago.
The goal is to gradually improve it in the next few months as I learned more about Julia while working on this project. And hopefully it will be something worth release at the end of the summer.
LRMoE.jl has several Zero Inflated random variables along w/ Burr
and GammaCount
.
He has a PR to add Burr to Distributions.jl
Thanks @Albert_Zevelev! I am already using it and also contributed :)
it will be released today as a Julia package.
Another cheatset to compare basic distribution usage with R and Python:
https://github.com/sylvaticus/commonDistributionsInJuliaPythonR
See mine wip:
One of the nice features of Distributions.jl is the ability to create new transformed distributions from existing distributions.

MixtureModel([Normal(0,1),Cauchy(0,1)], [0.5,0.5])
returns a new random variable 
Truncated(Cauchy(0,1), 0.25, 1.8)

convolve(Cauchy(0,1), Cauchy(5,2))
A recent PR proposes folded distributions.
This is cool b/c it automatically allows the user to access a large number of distributions:
foldedCauchy/foldednormal/HalfCauchy/halflogistic/halfnormal etc
There has been discussion about a generic ZeroInflated distribution here, here, here
Are there other important transformations of random variables not considered yet?
Maybe CensoredDistribution, Conditioned & Derived Statistical Distributions can provide some inspiration?
The way Julia handles this stuff is just miles ahead of other languages thanks to first class structs etc. In R you would have to write all the rfoo,pfoo,dfoo functions even if they are trivially derived from something else. In Stan you have to write your own logpdf functions etc as well, the ability to just say stuff like convolve(A,B) is truly fabulous.
I should probably include distributions in my tutorial vignettes Iām working on.
I think a good way to increase confidence in the correctness of our ecosystem is to implement more/better tests of systemically important packages such as Distributions.jl.
E.g.
Popoviciuās inequality: for any bounded univariate random variable X \in [m, M] we have \sigma^{2} \leq \frac{1}{4}(Mm)^2
Maybe some kind of loop over all uni distributions in the pkg, that checks if the RV is bounded & if various inequalities hold?
What does this mean? That (for large n
) rand(dist, n)
and rand(dist, 2n)
take the same amount of time?
Statistics! I often use ttest
etc. to test deviations of a random variable from itās known mean in unit testing, but itās a bit difficult to get unittesting working nicely with test which may fail, just not too often.
Therefore I like your @Albert_Zevelev example with Popoviciu because it makes a deterministic test which has a bit of slack but on the other hand never should fail. Hoeffding inequality also comes to mind or other finitesample properties.
No, of course not. O(1) is per IID sample.
The distinction is from situations where obtaining IID samples is practically impossible or very expensive, and you have to resort to MCMC. Practically, efficient IID sampling methods exist for all univariate distributions, but for multivariate distributions cheapo IID sampling is only possible in a few special cases.
Iām confused by the use of bigO notation here. In my understanding, bigO is used to express that the computation time depends on some parameter of the problem, most often its āsize.ā For example, if you were to generate a sample from the binomial(p, n)
distribution by calling sum(rand() < p for _ in 1:n)
, then this is an O(n) algorithm because its computation time grows linearly in n.
If your random variable is defined by a complicated stochastic process, for example X = f(Y, u) where Y is the RNG state and u is a vector of parameters, then it may take a long time to compute f, but this sampling time is typically constant in u, no?
I know itās a little off topic, but if itās not too much to ask, Iād appreciate an example.
Many of these special cases corresponds to copula models, for which standard sampling tools (and more!) are now available in Copulas.jl
. Disclaimer : Iām the main author of the package
It is not meant literally, just ignore it and focus on the practical part (which is, again: for some distributions, you can draw IID samples cheaply, but generally you cannot).