How to fit a normal distribution to some points in the tail

lesshaste · February 27, 2020, 2:19pm

I have some empirical values for the cdf of a distribution which should be close to Normal. These are for x = 1, 2, … , 13. They are as follows:

[5.2909066017292616e-17, 2.998035847356917e-15, 2.893916126231429e-12, 2.893916126231429e-12, 5.984024927027287e-11, 9.828060393355409e-10, 1.3065783612939083e-8, 1.4263214765731293e-7, 1.2913225520247604e-6, 9.765050285646565e-6, 6.197926915196307e-5, 0.00033122740938565793, 0.001493228789321792]

I would like to find a normal distribution that fits these as well as possible. To do this I first defined a cost function and then tried to use optimize. I suspect I am doing it wrong however.

using Distributions
using Optim
empiricalcdf = [5.2909066017292616e-17, 2.998035847356917e-15, 2.893916126231429e-12, 2.893916126231429e-12, 5.984024927027287e-11, 9.828060393355409e-10, 1.3065783612939083e-8, 1.4263214765731293e-7, 1.2913225520247604e-6, 9.765050285646565e-6, 6.197926915196307e-5, 0.00033122740938565793, 0.001493228789321792]
function cdfcost(x)
    return sum((empiricalcdf .- cdf.(Normal(x[1], x[2]), collect(1:13))).^2)
end
print(optimize(cdfcost, [10.0 ,2.0], BFGS()))

This gives me:

 * Status: success

 * Candidate solution
    Minimizer: [1.00e+01, 2.00e+00]
    Minimum:   2.428780e+00

 * Found with
    Algorithm:     BFGS
    Initial Point: [1.00e+01, 2.00e+00]

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 9.24e-01 ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    1
    f(x) calls:    52
    ∇f(x) calls:   52

This seems to mean no optimization has occurred. What am I doing wrong and is there a simpler way to achieve the same goal?

nilshg · February 27, 2020, 2:32pm

This seems to work:

μ_hat, σ_hat = optimize(cdfcost, [-100.0, 0.01], [100.0, 10.0], [0.0, 1.0]).minimizer

plot(0.1:0.1:13, [cdf(Normal(μ_hat, σ_hat), x) for x in 0.1:0.1:13])
scatter!(1:13, empiricalcdf)

lesshaste · February 27, 2020, 2:46pm

Is the only difference that you have added a box constraint?

nilshg · February 27, 2020, 3:53pm

Well it also uses a different algorithm and starting point - it looks like this problem is not well behaved? Consider (just picked the first few examples from the docs):

μ_hat, σ_hat = optimize(cdfcost, [-100.0, 0.01], [100.0, 10.0], [0.0, 1.0]).minimizer
μ_hat2, σ_hat2 = optimize(cdfcost, [0.0, 1.0], BFGS()).minimizer
μ_hat3, σ_hat3 = optimize(cdfcost, [1.0, 0.1], LBFGS(); autodiff = :forward).minimizer
μ_hat4, σ_hat4 = optimize(cdfcost, [1.0, 0.1]).minimizer

plot(0.1:0.1:14, [cdf(Normal(μ_hat, σ_hat), x) for x in 0.1:0.1:14])
plot!(0.1:0.1:14, [cdf(Normal(μ_hat2, σ_hat2), x) for x in 0.1:0.1:14])
plot!(0.1:0.1:14, [cdf(Normal(μ_hat3, σ_hat3), x) for x in 0.1:0.1:14])
plot!(0.1:0.1:14, [cdf(Normal(μ_hat4, σ_hat4), x) for x in 0.1:0.1:14])
scatter!(1:13, empiricalcdf)

Maybe @pkofod can shed some light on what’s going on

lesshaste · February 27, 2020, 4:53pm

All that we are doing is varying the mean and variance of a normal. It feels like the optimization should be well behaved.

Tamas_Papp · February 27, 2020, 5:25pm

Not really. The small probabilities don’t identify anything strongly (0 is a very good fit, producing a small distance metric), so you get iso-cost lines.

When in doubt, plot. Eg

using Distributions, PGFPlotsX
empiricalcdf = [5.2909066017292616e-17, 2.998035847356917e-15, 2.893916126231429e-12,
                2.893916126231429e-12, 5.984024927027287e-11, 9.828060393355409e-10,
                1.3065783612939083e-8, 1.4263214765731293e-7, 1.2913225520247604e-6,
                9.765050285646565e-6, 6.197926915196307e-5, 0.00033122740938565793,
                0.001493228789321792]
cdfcost(μ, σ) = sum(abs2, empiricalcdf .- cdf.(Normal(μ, σ), collect(1:13)))

μ = range(-5, 5, length = 50)
σ = range(0.1, 2, length = 50)
@pgf Axis({ view = (0, 90), colorbar, "colormap/jet",
            xlabel = raw"$\mu$", ylabel = raw"$\sigma$"},
          Plot3({ surf, shader = "flat" },
                Table(μ, σ, cdfcost.(μ, σ'))))

Try a likelihood-based method.

Tamas_Papp · February 27, 2020, 5:38pm

FWIW, the likelihood I wrote for what I imagine is the same problem should work for the truncated data.

tbeason · February 27, 2020, 6:28pm

I’m going to just second what @Tamas_Papp has already said. The data are too far into the tail to really be informative. Based on your other-other post, you using a skew normal, that is actually only going to make the problem worse because it is actually a more flexible distribution. That means that tail outcomes are even less informative about the rest of the shape of the distribution. Regardless of the parameters of the distribution, all of the CDFs must approach zero.

This isn’t to say that you CAN’T do what you are doing. Just an explanation for why it isn’t as straightforward as you thought it might be.

EDIT: This is clear from looking at the plots on the wiki page as well. Skew normal distribution - Wikipedia

lesshaste · February 27, 2020, 7:37pm

To be fair, I didn’t really think it would be straightforward. But I was hoping that with the very high precision of BigFloat optimization might work. I have in mind that 3 points (even from the tail if one has enough precision) is enough to uniquely define a normal distribution. Perhaps 4 is sufficient to uniquely define a skew normal (I don’t have a proof for that).

Thank you so much for all the help you are giving. I hope it’s a fun problem!

tbeason · February 27, 2020, 7:40pm

I think your conjectures would be correct if indeed the points came from such distributions. But with noisy data, anything goes.

lesshaste · February 27, 2020, 7:42pm

It’s in fact worse than that. My data is in fact not noisy (I have precise rational values for the densities which I convert to BigFloat). It just doesn’t come from precisely a normal distribution. It will be close to a normal and even closer to a skew normal I suspect.

tbeason · February 27, 2020, 7:43pm

You could try some extreme value techniques perhaps. Fit some tail distribution to the data, then find a full distribution consistent with those estimates. Just a weird thought.

Tamas_Papp · February 28, 2020, 7:31am

It looks like you are trying to match arbitrary “moments” (in the broad sense) of one distribution to another by solving a nonlinear optimization problem.

Numerical problems aside, I am not sure this is a meaningful exercise: as @tbeason pointed out, the results will be somewhat arbitrary, and very sensitive to minor variations in values (even just floating point noise in the tails) and the moments you use.

Note that the method you are using here does not allow you to evaluate how good the fit is in any way that is statistically meaningful.

Also, if you have precise values from a distribution, then the best fitting distribution is (drumroll) itself. There is nothing special about “named” distribution families, especially skew normal which is not standard at all (there are many different families called “skew normal”, depending on the transformation you use for skewing).

It looks like you have a conceptual, not a computational problem. This may be a good point to step back a bit and reflect on the purpose of the exercise.

lesshaste · February 28, 2020, 8:53am

Also, if you have precise values from a distribution, then the best fitting distribution is (drumroll) itself.

I have precise values only for x = 1…13. I would like to infer the rest of the distribution and I have a strong belief in what sort of distribution it is.

Thanks for all your help so far. Your MCMC code is particularly intriguing. Although I understand MCMC mathematically I haven’t managed to understand the code yet!

pbayer · February 28, 2020, 9:43am

Just to add my two cents: I had often normal distributed data to fit. If you do a probability plot or qq-plot you see that in most cases the tails do not fit. It is extremely rare to have normal distributed data even with very good fits to have fitting tails.

Therefore I wouldn’t even try to fit this. Only if you have some additional information or assumptions eg. expected mean …, you can do something with Bayesian methods …

lesshaste · February 28, 2020, 9:54am

That’s a very good point. A minor related question, how do you specify convergence criteria when you have box constraints?

kevinczimmerman · February 28, 2020, 1:14pm

Maybe this is more for others who land on this thread in pursuit of an answer to their own question, Distributions.jl has their own distribution fitting.

Tamas_Papp · February 28, 2020, 1:20pm

I don’t think that works for this particular case — AFAICT none of the fit_mle methods support truncated normals.

Also, I think it is important to emphasize that this exercise has a conceptual problem, which should be tackled before the solution is attempted. Coding these methods in Julia is relatively easy after that.

lesshaste · February 28, 2020, 2:18pm

I may have the terminology wrong but I don’t believe I have a truncated normal. I have simply been told the densities of a normal at 13 points in the tail and I have to work out what the full distribution is (i.e. the mean and variance). I should say the optimization approach works ok but I just have to do a grid search to find the minimum error where I would have expected Optim to be able to do something cleverer.

Topic		Replies	Views
Optimization giving wildly different answers depending on box constraints Optimization (Mathematical)	2	618	February 27, 2020
Need advice when data almost (but not quite) fit log normal dist Statistics distributions	7	919	January 13, 2021
Cumulative distribution function results are not perfectly fit for simulation and analytical methods Statistics distributions , cdf	4	375	March 8, 2023
Smoothing empirical CDFs Statistics smoothing	6	504	September 21, 2022
Cdf of multivariate normal in distributions.jl Statistics	24	10417	April 24, 2023

How to fit a normal distribution to some points in the tail

Related topics