Questions about contributing to Distributions.jl

I’m a little confused about what’s going on.

  1. StatsFuns.jl currently has 14 core distributions.
    Each distribution has exactly 10 properties: pdf/cdf/invcdf …
    It says:
    “We recommend using the Distributions.jl package for a more convenient interface.”
  2. Distributions.jl has \approx 80 distributions w/ several more in progress.
    Each has different properties.
    Some properties are not defined for that distribution, some still need to be added.
    Distributions.jl allows users to create new distributions from existing ones via: truncation-mixture-products-convolution…
  3. One advantage of keeping distributions in the same repo is how easy it is to access all of them.
    Suppose I got a new dataset & wanna see which “name-brand” distribution best fits it. I can automatically fit all relevant distributions w/ a single package.
Code to fit all relevant distributions in Distributions.jl
using Distributions, Random, HypothesisTests;

Uni = subtypes(UnivariateDistribution)
#Cts_Uni = subtypes(ContinuousUnivariateDistribution)
DGP_True = LogNormal(17,7);
Random.seed!(123);
const d_train = rand(DGP_True, 1_000)
const d_test  = rand(DGP_True, 1_000)

Er =[]; D_fit  =[];
for d in Uni
    println(d)
    try
        dd = "$(d)"   |> Meta.parse |> eval
        D̂ = fit(dd, d_train)
        Score = [loglikelihood(D̂, d_test),
                OneSampleADTest(d_test, D̂)            |> pvalue,
                ApproximateOneSampleKSTest(d_test, D̂) |> pvalue,
                ExactOneSampleKSTest(d_test, D̂)       |> pvalue,
                #PowerDivergenceTest(d_test,lambda=1)  Not working!!!
                JarqueBeraTest(d_test)                |> pvalue   #Only Normal 
        ];
        #Score = loglikelihood(D̂, ds) #TODO: compute a better score.
        push!(D_fit, [d, D̂, Score])
    catch e
        println(e, d)
        push!(Er, (d,e))
    end
end

a=hcat(D_fit...)
M_names =  a[1,:]; M_fit   =  a[2,:]; M_scores = a[3,:];
idx =sortperm(M_scores, rev=true);
Dfit_sort=hcat(M_names[idx], sort(M_scores, rev=true) )
Output
julia> Dfit_sort
11×3 Array{Any,2}:
 LogNormal              …  [-20600.7, 0.823809, 0.789128, 0.781033, 0.0]
 Gamma                     [-21159.4, 6.0e-7, 2.45426e-68, 1.23247e-69, 0.0]
 Cauchy                    [-24823.3, 6.0e-7, 2.91142e-213, 8.6107e-227, 0.0]
 InverseGaussian           [-26918.1, 6.0e-7, 0.0, 0.0, 0.0]
 Exponential               [-33380.3, 6.0e-7, 0.0, 0.0, 0.0]
 Normal                 …  [-40611.5, 6.0e-7, 1.32495e-213, 3.51792e-227, 0.0]
 Rayleigh                  [-61404.6, 6.0e-7, 0.0, 0.0, 0.0]
 Laplace                   [-2.03419e9, 6.0e-7, 1.49234e-138, 5.47197e-144, 0.0]
 DiscreteNonParametric     [-Inf, 6.0e-7, 0.197933, 0.193494, 0.0]
 Pareto                    [-Inf, 6.0e-7, 6.69184e-108, 3.7704e-111, 0.0]
 Uniform                …  [-Inf, 6.0e-7, 0.0, 0.0, 0.0]
2 Likes