Fitting a distribution not supported by Distributions.jl

Hi,

I want to fit an analytical distribution in Julia from discrete points. For instance, this specific Levy distribution is listed in Distributions.jl, but the fit method is not supported for this type: Distribution Fitting · Distributions.jl

using Distributions

D = Levy()
x = rand(D, 1000)
d = fit(Levy, x)
ERROR: suffstats is not implemented for (Levy, Vector{Float64}).
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:35
 [2] suffstats(dt::Type{Levy}, xs::Vector{Float64})
   @ Distributions C:\Users\hyzho\.julia\packages\Distributions\YQSrn\src\genericfit.jl:5
 [3] fit_mle(dt::Type{Levy}, x::Vector{Float64})
   @ Distributions C:\Users\hyzho\.julia\packages\Distributions\YQSrn\src\genericfit.jl:27
 [4] fit(dt::Type{Levy}, x::Vector{Float64})
   @ Distributions C:\Users\hyzho\.julia\packages\Distributions\YQSrn\src\genericfit.jl:46
 [5] top-level scope
   @ REPL[20]:1

In general, how should I fit these kinds of analytic distributions?

From what I understand, fit is only implemented for some distributions especially when an efficient ML estimator is available.
Depending on your needs, you can always reach for an optimization package

using Optim, ComponentArrays

p₀ = ComponentVector(; μ = 0.0, logσ = 0.0)
mle = Optim.optimize(p -> - sum(logpdf.(Levy(p.μ, exp(p.logσ)), x)), p₀, LBFGS())

or go for a Bayesian approach

using Turing

@model function fitme(x)
    μ ~ Normal(0, 10)
    σ ~ truncated(TDist(3), lower = 0)
    x .~ Levy(μ, σ)
end

chains = sample(fitme(x), NUTS(), 1_000)
2 Likes

Thanks for the tips! I need some time to learn about the usage of these optimization packages. For instance when using Optim, how can I check the fitted parameters (in this Levy case μ and σ)?

Optim.minimizer(mle) will give you the optimized parameters. Note that I have used logσ as I like using unconstrained optimizers, thus you will need to take the exponential of the optimized value.

I have another related question to the usage of Optim. Now I wrap the calls into the following function:

function fit_levy(x)
   p₀ = ComponentVector(; μ = 0.0, logσ = 0.0)
   mle = Optim.optimize(p -> - sum(logpdf.(Levy(p.μ, exp(p.logσ)), x)), p₀, LBFGS())
   param = Optim.minimizer(mle)
   μ = param.μ
   σ = exp(param.logσ)

   μ, σ
end

I found that the optimization process is sensitive to the scales of the numbers:

julia> a = [   25.77444756189654
        1941.9959559504289
           30.90865782846204
         789.5316974005192
          67.17259874993165
             29.16640194223711
          28.6202910812033
         933.401245894768
          86.69688566840875
             39.04475622775034]
10-element Vector{Float64}:
   25.77444756189654
 1941.9959559504289
   30.90865782846204
  789.5316974005192
   67.17259874993165
   29.16640194223711
   28.6202910812033
  933.401245894768
   86.69688566840875
   39.04475622775034

julia> μ1, σ1 = fit_levy(a)
(24.207532552609223, 7.545442547618322)

julia> μ1, σ1 = fit_levy(a / 100)
(0.2420753224182992, 0.07545443480661769)

julia> μ1, σ1 = fit_levy(a / 1000)
(0.024207499550423738, 0.007545541553505919)

julia> μ1, σ1 = fit_levy(a / 10000)
ERROR: DomainError with 0.0:
Levy: the condition σ > zero(σ) is not satisfied.
Stacktrace:
  [1] #331
    @ C:\Users\hyzho\.julia\packages\Distributions\YQSrn\src\univariate\continuous\levy.jl:31 [inlined]
  [2] check_args
    @ C:\Users\hyzho\.julia\packages\Distributions\YQSrn\src\utils.jl:89 [inlined]
  [3] #Levy#330
    @ C:\Users\hyzho\.julia\packages\Distributions\YQSrn\src\univariate\continuous\levy.jl:31 [inlined]
  [4] Levy
    @ C:\Users\hyzho\.julia\packages\Distributions\YQSrn\src\univariate\continuous\levy.jl:30 [inlined]
  [5] (::var"#3#4"{Vector{Float64}})(p::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{(μ = 1, logσ = 2)}}})
    @ Main D:\Research\Turbulence\analysis\fit_distributions.jl:20
  [6] value_and_gradient!
    @ C:\Users\hyzho\.julia\packages\DifferentiationInterface\alBlj\ext\DifferentiationInterfaceFiniteDiffExt\onearg.jl:329 [inlined]
  [7] (::NLSolversBase.var"#fg!#14"{…})(_g::ComponentVector{…}, _x::ComponentVector{…})
    @ NLSolversBase C:\Users\hyzho\.julia\packages\NLSolversBase\n7XXO\src\objective_types\oncedifferentiable.jl:53
  [8] value_gradient!!(obj::OnceDifferentiable{…}, x::ComponentVector{…})
    @ NLSolversBase C:\Users\hyzho\.julia\packages\NLSolversBase\n7XXO\src\interface.jl:82
  [9] value_gradient!(obj::OnceDifferentiable{…}, x::ComponentVector{…})
    @ NLSolversBase C:\Users\hyzho\.julia\packages\NLSolversBase\n7XXO\src\interface.jl:69
 [10] value_gradient!(obj::Optim.ManifoldObjective{OnceDifferentiable{…}}, x::ComponentVector{Float64, Vector{…}, Tuple{…}})
    @ Optim C:\Users\hyzho\.julia\packages\Optim\7krni\src\Manifolds.jl:50
 [11] (::LineSearches.var"#ϕdϕ#6"{…})(α::Float64)
    @ LineSearches C:\Users\hyzho\.julia\packages\LineSearches\b4CwT\src\LineSearches.jl:83
 [12] (::LineSearches.HagerZhang{…})(ϕ::Function, ϕdϕ::LineSearches.var"#ϕdϕ#6"{…}, c::Float64, phi_0::Float64, dphi_0::Float64)
    @ LineSearches C:\Users\hyzho\.julia\packages\LineSearches\b4CwT\src\hagerzhang.jl:147
 [13] HagerZhang
    @ C:\Users\hyzho\.julia\packages\LineSearches\b4CwT\src\hagerzhang.jl:103 [inlined]
 [14] perform_linesearch!(state::Optim.LBFGSState{…}, method::LBFGS{…}, d::Optim.ManifoldObjective{…})
    @ Optim C:\Users\hyzho\.julia\packages\Optim\7krni\src\utilities\perform_linesearch.jl:58
 [15] update_state!(d::OnceDifferentiable{…}, state::Optim.LBFGSState{…}, method::LBFGS{…})
    @ Optim C:\Users\hyzho\.julia\packages\Optim\7krni\src\multivariate\solvers\first_order\l_bfgs.jl:220
 [16] optimize(d::OnceDifferentiable{…}, initial_x::ComponentVector{…}, method::LBFGS{…}, options::Optim.Options{…}, state::Optim.LBFGSState{…})
    @ Optim C:\Users\hyzho\.julia\packages\Optim\7krni\src\multivariate\optimize\optimize.jl:65
 [17] optimize
    @ C:\Users\hyzho\.julia\packages\Optim\7krni\src\multivariate\optimize\optimize.jl:43 [inlined]
 [18] #optimize#77
    @ C:\Users\hyzho\.julia\packages\Optim\7krni\src\multivariate\optimize\interface.jl:274 [inlined]
 [19] optimize
    @ C:\Users\hyzho\.julia\packages\Optim\7krni\src\multivariate\optimize\interface.jl:264 [inlined]
 [20] optimize(f::Function, initial_x::ComponentVector{…}, method::LBFGS{…})
    @ Optim C:\Users\hyzho\.julia\packages\Optim\7krni\src\multivariate\optimize\interface.jl:264
 [21] fit_levy(x::Vector{Float64})
    @ Main D:\Research\Turbulence\analysis\fit_distributions.jl:20
 [22] top-level scope
    @ REPL[32]:1
Some type information was truncated. Use `show(err)` to see complete types.

Is there a way to avoid the above condition unsatisfied error?

And yet another small question is, how can I impose constraints to the parameters when performing optimizations? For instance, in the Levy distribution case, I may wish to have a nonnegative location parameter mu.

logmu also works