A comparison of common distributions in Julia, Python and R

Common distributions in Julia, Python and R

Please report errors on GitHub - sylvaticus/commonDistributionsInJuliaPythonR: Common probability distributions in Julia, Python and R

Loading packages

  • Julia: using Distributions
  • Python: from scipy import stats
  • R: library(extraDistr)

Discrete distributions

  • Discrete Uniform : Complete ignorance
  • Bernoulli : Single binary trial
  • Binomial : Number of successes in independent binary trials
  • Categorical : Individual categorical trial
  • Multinomial : Number of successes of the various categories in independent multinomial trials
  • Geometric : Number of independent binary trials until (and including) the first success (discrete time to first success)
  • Hypergeometric : Number of successes sampling without replacement from a bin with given initial number of items representing successes
  • Multivariate hypergeometric : Number of elements sampled in the various categories from a bin without replacement
  • Poisson : Number of independent arrivals in a given period given their average rate per that period length
  • Pascal : Number of independent binary trials until (and including) the n-th success (discrete time to n-th success).
Name Parameters Support PMF Expectations Variance CDF
D. Unif a,b ∈ Z with b ≧ a x \in \{a,a+1,...,b\} \frac{1}{b-a+1} \frac{a+b}{2} \frac{(b-a)(b-a+2)}{12} \frac{x-a+1}{b-a+1}
Bern p ∈ [0,1] x ∈ {0,1} p^x(1-p)^{1-x} p p(1-p) \sum_{i=0}^x p^i(1-p)^{1-i}
Bin p ∈ [0,1], n in N⁺ x \in \{0,...,n\} {{n} \choose {x}} p^x(1-p)^{1-x} np n p(1-p) \sum_{i=0}^{x} {{n} \choose {i}} p^i(1-p)^{1-i}
Cat p_1,p_2,...,p_K with p_k \in [0,1] and \sum_{k=1}^K p_k =1 x ∈ {1,2,…,K} \prod_{k=1}^K p_k^{\mathbb{1}(k=x)}
Multin n, p_1,p_2,...,p_K with p_k \in [0,1], \sum_{k=1}^K p_k =1 and n \in N^+ x \in \mathbb{N}_{0}^K {{n} \choose {x_1, x_2,...,x_K}} \prod_{k=1}^K p_k^{x_K}
Geom p ∈ [0,1] x ∈ N⁺ (1-p)^{x-1}p \frac{1}{p} \frac{1-p}{p^2} 1-(1-p)^x
Hyperg n_s,n_f, n \in \mathbb{N}_{0} x \in \mathbb{N}_{0} with x \leq n_s \frac{{n_s \choose x} {n_f \choose n-x} }{ (n_s + n_f) \choose n } n \frac{n_s}{n_s+n_f} n\frac{n_s}{n_s+n_f}\frac{n_f}{n_s+n_f}\frac{n_s+n_f+n}{n_s+n_f+1}
Multiv hyperg n_1,n_2,...,n_K, n with n \in \mathbb{N}_{+}, n_i \in \mathbb{N}_{0} x \in \mathbb{N}_{0}^K with x_i \leq n_i ~ \forall i, \sum_{i=1}^K x_i = n \frac{\prod_{i=1}^K {n_i \choose x_i} }{ \sum_{i=1}^K n_i \choose n } n\frac{n_i}{\sum_{i=1}^K n_i} n\frac{\sum_{j=1}^K n_j - n}{\sum_{j=1}^K n_j - 1} \frac{n_i}{\sum_{j=1}^K n_j} \left(1 - \frac{n_i}{\sum_{j=1}^K n_j} \right)
Pois λ in R⁺ x ∈ N₀ \frac{\lambda^xe^{-\lambda}}{x!} \lambda \lambda
Pasc n ∈ N⁺, p in [0,1] x in N⁺ {x-1 \choose n-1} p^n (1-p)^{x-n} \frac{n}{p} \frac{n(1-p)}{p^2}
Distribution Julia Python (stats.[distributionName]) R
Discrete uniform DiscreteUniform(lRange,uRange) randint(lRange,uRange) dunif(lRange,uRange)
Bernoulli Bernoulli(p) bernoulli(p) bern(p)
Binomial Binomial(n,p) binom(n,p) binom(n,p)
Categorical Categorical(ps) Not Av. cat(ps)
Multinomial Multinomial(n, ps) multinomial(n, ps) mnom(n,ps)
Geometric Geometric(p) geom(p) geom(p)
Hypergeometric Hypergeometric(nS, nF, nTrials) hypergeom(nS+nF,nS,nTrials) hyper(nS, nF, nTrias)
Mv hypergeometric Not Av. multivariate_hypergeom(initialNByCat,nTrials) mvhyper(initialNByCat,nTrials)
Poisson Poisson(rate) poisson(rate) pois(rate)
Negative Binomial NegativeBinomial(nSucc,p) nbinom(nSucc,p) nbinom(nSucc,p)

Continuous distributions

  • Uniform complete ignorance, pick at random, all equally likely outcomes
  • Exponential waiting time to first event whose rate is λ (continuous time to first success)
  • Normal The asymptotic distribution of a sample means
  • Erlang Time of the n-th arrival
  • Cauchy The ratio of two independent zero-means normal r.v.
  • Chi-squared The sum of the squared of iid standard normal r.v.
  • T distribution The distribution of a sample means
  • F distribution : The ratio of the ratio of two indep Χ² r.v. with their relative parameter
  • Beta distribution The Beta distribution
  • Gamma distribution Generalisation of the exponential, Erlang and chi-square distributions
Name Parameters Support PMF Expectations Variance CDF
Unif a,b ∈ R with b ≧ a x ∈ [a,b] \frac{1}{b-a} \frac{a+b}{2} \frac{(b-a)^2}{12} \frac{x-a}{b-a}
Expo λ ∈ R⁺ x ∈ R⁺ \lambda e^{-\lambda x} \frac{1}{\lambda} \frac{1}{\lambda^2} 1-e^{-\lambda x}
Normal μ ∈R, σ² ∈ R⁺ x ∈ R \frac{1}{\sigma \sqrt{2 \pi}}e^\frac{-(x-\mu)^2}{2\sigma^2} \mu \sigma^2
Erlang n ∈ N⁺, λ ∈ R⁺ x ∈ Rₒ \frac{\lambda^n x^{n-1} e^{-\lambda x} }{(n - 1) !} \frac{n}{\lambda} \frac{n}{\lambda^2}
Cauchy x₀ ∈ R (location), γ ∈ R⁺ (scale) \frac{1}{\pi \gamma (1+(\frac{x-x_0}{\gamma})^2) }
Chi-sq d ∈ N⁺ x ∈ R⁺ \frac{1}{2^{}\frac{d}{2}\Gamma(\frac{d}{2})} x^{\frac{d}{2})-1}e^{-\frac{x}{2}} d 2d
T ν ∈ R⁺ x ∈ R \frac{ \Gamma(\frac{\nu +1}{2})}{\sqrt{\nu \pi} \Gamma(\frac{\nu}{2})} \left( 1 + \frac{x^2}{\nu} \right)^{- \frac{\nu + 1}{2}}
F d₁ ∈ N⁺ d₂ ∈ N⁺ x ∈ R⁺ \frac {\sqrt {\frac {(d_1 x)^{d_1} d_2^{d_2} } {(d_1 x + d_2)^{d_1 + d_2} } }} {x \mathrm {B} \left( \frac{d_1}{2},\frac {d_2}{2} \right) } \frac{d_2}{d_2 -2} for d_2 > 2 \frac{2 d_2^2 (d_1 + d_2 -2)}{d_1 (d_2 -2)^2 (d_2 -4)} for d_2 > 4
Beta α, β ∈ R⁺ x ∈ [1,0] \frac{1}{B(\alpha,\beta)}x^{\alpha-1}(1-x)^{\beta-1} \frac{\alpha}{\alpha+\beta} \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}
Gamma α ∈ R⁺ (shape), β ∈ R⁺ (rate) x ∈ R⁺ \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x} \frac{\alpha}{\beta} \frac{\alpha}{\beta^2}

Beta function : B(\alpha,\beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)} = \frac{\alpha + \beta}{\alpha \beta}
Gamma function: \Gamma(x)=(x-1)! ~ \forall x \in N

Distribution Julia Python (stats.[distributionName]) R
Uniform Uniform(lRange,uRange) uniform(lRange,uRange) unif(lRange,uRange)
Exponential Exponential(rate) expon(rate) exp(rate)
Normal Normal(μ,sqrt(σsq)) norm(μ,math.sqrt(σsq)) norm(μ,sqrt(σsq))
Erlang Erlang(n,rate) erlang(n,rate) Use gamma
Cauchy Cauchy(μ, σ) cauchy(μ, σ) cauchy(μ,σ)
Chisq Chisq(df) chi2(df) chisq(df)
T Dist TDist(df) t(df) t(df)
F Dist FDist(df1, df2) f(df1, df2) f(df1,df2)
Beta Dist Beta(shapeα,shapeβ) beta(shapeα,shapeβ) beta(shapeα,shapeβ)
Gamma Dist Gamma(shapeα,1/rateβ) gamma(shapeα,1/rateβ) gamma(shapeα,1/rateβ)

Note: The Negative Binomial returns the number of failures before n successes instead of the total trials to n successes as the Pascal distribution

Usage

y = CDF(x), i.e. y ∈ [0,1]

Julia Python R
Mean mean(d) d.mean()
Variance var(d) d.var()
Median median(d) d.median()
Sample rand(d) d.rvs() r[distributionName](1,distributionParameters), e.g. runif(1,10,20)
Quantiles (F^{-1}(y)) quantile(d,y) d.ppf(y) q[distributionName](y, distributionParameters), e.g. qunif(0.2,10,20)
PDF/PMF pdf(d,x) d.pmf(x) for discrete r.v. and d.pdf(x) for continuous ones d[distributionName](x, distributionParameters), e.g. dunif(15,10,20)
CDF cdf(d,x) d.cdf(x) p[distributionName](x, distributionParameters), e.g. punif(15,10,20)
6 Likes

@cscherrer 's MeasureTheory.jl ecosystem might be another Julia option?

1 Like