Questions about contributing to Distributions.jl

Albert_Zevelev · June 18, 2020, 12:50am

I’m a little confused about what’s going on.

StatsFuns.jl currently has 14 core distributions.
Each distribution has exactly 10 properties: pdf/cdf/invcdf …
It says:
“We recommend using the Distributions.jl package for a more convenient interface.”
Distributions.jl has \approx 80 distributions w/ several more in progress.
Each has different properties.
Some properties are not defined for that distribution, some still need to be added.
Distributions.jl allows users to create new distributions from existing ones via: truncation-mixture-products-convolution…
One advantage of keeping distributions in the same repo is how easy it is to access all of them.
Suppose I got a new dataset & wanna see which “name-brand” distribution best fits it. I can automatically fit all relevant distributions w/ a single package.

Code to fit all relevant distributions in Distributions.jl

using Distributions, Random, HypothesisTests;

Uni = subtypes(UnivariateDistribution)
#Cts_Uni = subtypes(ContinuousUnivariateDistribution)
DGP_True = LogNormal(17,7);
Random.seed!(123);
const d_train = rand(DGP_True, 1_000)
const d_test  = rand(DGP_True, 1_000)

Er =[]; D_fit  =[];
for d in Uni
    println(d)
    try
        dd = "$(d)"   |> Meta.parse |> eval
        D̂ = fit(dd, d_train)
        Score = [loglikelihood(D̂, d_test),
                OneSampleADTest(d_test, D̂)            |> pvalue,
                ApproximateOneSampleKSTest(d_test, D̂) |> pvalue,
                ExactOneSampleKSTest(d_test, D̂)       |> pvalue,
                #PowerDivergenceTest(d_test,lambda=1)  Not working!!!
                JarqueBeraTest(d_test)                |> pvalue   #Only Normal 
        ];
        #Score = loglikelihood(D̂, ds) #TODO: compute a better score.
        push!(D_fit, [d, D̂, Score])
    catch e
        println(e, d)
        push!(Er, (d,e))
    end
end

a=hcat(D_fit...)
M_names =  a[1,:]; M_fit   =  a[2,:]; M_scores = a[3,:];
idx =sortperm(M_scores, rev=true);
Dfit_sort=hcat(M_names[idx], sort(M_scores, rev=true) )

Output

julia> Dfit_sort
11×3 Array{Any,2}:
 LogNormal              …  [-20600.7, 0.823809, 0.789128, 0.781033, 0.0]
 Gamma                     [-21159.4, 6.0e-7, 2.45426e-68, 1.23247e-69, 0.0]
 Cauchy                    [-24823.3, 6.0e-7, 2.91142e-213, 8.6107e-227, 0.0]
 InverseGaussian           [-26918.1, 6.0e-7, 0.0, 0.0, 0.0]
 Exponential               [-33380.3, 6.0e-7, 0.0, 0.0, 0.0]
 Normal                 …  [-40611.5, 6.0e-7, 1.32495e-213, 3.51792e-227, 0.0]
 Rayleigh                  [-61404.6, 6.0e-7, 0.0, 0.0, 0.0]
 Laplace                   [-2.03419e9, 6.0e-7, 1.49234e-138, 5.47197e-144, 0.0]
 DiscreteNonParametric     [-Inf, 6.0e-7, 0.197933, 0.193494, 0.0]
 Pareto                    [-Inf, 6.0e-7, 6.69184e-108, 3.7704e-111, 0.0]
 Uniform                …  [-Inf, 6.0e-7, 0.0, 0.0, 0.0]

Topic		Replies	Views
Is Distributions.jl currently accepting PRs for new distributions? Statistics	1	668	February 15, 2020
Multivariate noncentral hypergeometric distributions Statistics question , distributions	5	591	May 21, 2021
[ANN] NumericalDistributions.jl: user-defined distributions Statistics package , announcement	1	250	April 13, 2025
How do I understand the documentation (for Distributions.jl)? New to Julia	3	628	March 27, 2023
Random variables in Julia (working list) Statistics distributions	36	6081	November 27, 2022

Questions about contributing to Distributions.jl

Related topics