How can I automatically find all univariate continuous cdfs in Distributions.jl?
I believe this can make a lot of tasks much easier including maintaining Distributions.jl.
Eg MLJ.jl: using MLJ; X, y = @load_boston;
m = models()
: creates a vector of all 132 models.
m = models(matching(X, y))
: vector of 53 models that work w/ the data
m = models(matching(X, y), x -> x.prediction_type == :deterministic)
: vector of 50 models
In Distributions.jl
using Distributions, Random;
Random.seed!(123);
Currently: Distributions.continuous_distributions
creates a vector{string} w/ 48 elements
We cannot use this bc arcsine
should be Arcsine
& betaprime
should be BetaPrime
etc
I scraped some from the repo:
Code
Cts_Uni =["Arcsine",
"Beta",
"BetaPrime",
"Cauchy",
"Chernoff",
"Chi",
"Chisq",
"Cosine",
"Epanechnikov",
"Erlang",
"Exponential",
"FDist",
"Frechet",
"Gamma",
"GeneralizedExtremeValue",
"GeneralizedPareto",
"Gumbel",
"InverseGamma",
"InverseGaussian",
"Laplace",
"Levy",
"Logistic",
"LogNormal",
"NoncentralBeta",
"NoncentralChisq",
"NoncentralF",
"NoncentralT",
"Normal",
"NormalInverseGaussian",
"NormalCanon",
"Pareto",
"Rayleigh",
"StudentizedRange",
"SymTriangularDist",
"TDist",
"TriangularDist",
"TruncatedNormal",
"Uniform",
"VonMises",
"Weibull"];
Er =[];
D_mean =[];
for d in Cts_Uni
println(d)
try
d0 = "$(d)()" |> Meta.parse |> eval
ÎĽ = mean(d0)
push!(D_mean, (d, ÎĽ))
catch e
println("!!! Error ", d, e)
push!(Er, (d,e) )
end
end
- We automatically see 13/40 distributions don’t have default parameters.
Example:Chi()
givesMethodError: no method matching Chi()
- 7 of the 27/40 dist w/ default parameters give mean= NaN or mean=Inf.
Eg: BetaPrime(), Cauchy(), Pareto()…
Er =[];
D_ent =[];
for d in Cts_Uni
println(d)
try
d0 = "$(d)()" |> Meta.parse |> eval
ε = entropy(d0)
push!(D_ent, (d, ε))
catch e
println("!!! Error ", d, e)
push!(Er, (d,e) )
end
end
5/27 dist w/ default parameters don’t have entropy.
(It doesn’t say NaN
or Inf
just gives an error message.)
Perhaps: “The entropy for this distribution has not been coded. Please submit a PR.”
Same w/ quantiles:
Er =[];
D_q =[];
for d in Cts_Uni
println(d)
try
d0 = "$(d)()" |> Meta.parse |> eval
q = quantile(d0, 0.025)
push!(D_q, (d, q))
catch e
println("!!! Error ", d, e)
push!(Er, (d,e) )
end
end
Suppose I have some data & I want to find the probability distribution that best fits it.
DGP_True = LogNormal(-1.5);
const ds = rand(DGP_True, 1_000); #Training Data.
x = range(-2, stop=3, length=200); #Test Data.
#
Er =[];
D_fit =[];
for d in Cts_Uni
println(d)
try
dd = "$(d)" |> Meta.parse |> eval
DĚ‚ = fit(dd, ds)
Score = loglikelihood(DĚ‚, ds) #TODO: compute a better score.
push!(D_fit, [d, DĚ‚, Score])
catch e
println(e, d)
push!(Er, (d,e))
end
end
#
Er
D_fit
a=hcat(D_fit...)
M_names = a[1,:]
M_fit = a[2,:]
M_scores = a[3,:]
idx =sortperm(M_scores, rev=true)
Dfit_sort=hcat(M_names[idx], M_fit[idx], sort(M_scores, rev=true) )
#
using Plots
plot( x, pdf.(DGP_True, x), label= "DGP_True" )
plot!(x, pdf.(Dfit_sort[1,2], x), label=Dfit_sort[1,1] )
plot!(x, pdf.(Dfit_sort[2,2], x), label=Dfit_sort[2,1] )
plot!(x, pdf.(Dfit_sort[3,2], x), label=Dfit_sort[3,1] )
plot!(x, pdf.(Dfit_sort[4,2], x), label=Dfit_sort[4,1] )
There is a growing beautiful literature on probabilistic forecasting that can benefit greatly from more structure in Distributions.jl:
ngboost, CatBoostLSS, XGBoostLSS , Gamlss, GamboostLSS, bamlss , disttree
Do you have ideas for better ways to organize Distributions.jl?