How to calculate confidence interval?

Ashu · May 15, 2023, 11:46am

Dear All,

I have a random sample, let’s say, smpl = {1, 4, 2, 6, 10, 18, 3, 5, 20}. I wish to calculate a 99% confidence interval on [minimum(smpl), maximum(smpl)]=[1, 20]. I am able to calculate a 99% confidence interval for a mean(smpl), but don’t know how to do it for the lower and upper values of a sample. Could you please help me to solve it? In the actual sample, I have more than 1000 elements. I am using Distributions.jl, IntervalArithmetic.jl, HypothesisTests.jl packages in Pluto.jl notebook.

Thank you in advance

johnmyleswhite · May 15, 2023, 12:25pm

https://juliastats.org/HypothesisTests.jl/stable/methods/#Confidence-interval

nilshg · May 15, 2023, 1:59pm

When you say “don’t know how to do it for the lower and upper values of a sample” I suppose you mean you want to construct a confidence interval on the population minimum and maximum from a sample? If that’s the case then that won’t be possible in general, so you might need to provide some more context for what you’re actually after.

Ashu · May 15, 2023, 2:32pm

Yes @nilshg, I want to construct a confidence interval on the population minimum and maximum from a sample. An example is shown below-

using Distributions.jl, IntervalArithmetic.jl, HypothesisTests.jl
       
        N = 1000
	
		a = rand(Uniform(10, 20), N);
		b = rand(Uniform(5, 10), N);
		c = rand(Uniform(25, 50), N);
	
		sample = a.^2 + b.^2 + c.^2

		limits = [minimum(sample), maximum(sample)]

I am looking for the 95% or 99% confidence interval on limits=[772.775, 2889.33] value.

nilshg · May 15, 2023, 3:32pm

Okay, so I think what you’re trying to do doesn’t work (although maybe @johnmyleswhite has a clever idea, his certainly a better statistician than me!)

The population minimum (or maximum) is a unique (for a continuous distribution) value and as such a sample in and of itself without some distributional assumptions can’t really tell you much about it - the sample either includes the minimum or it doesn’t, and you don’t know which it is.

You could look into quantile estimators and confidence intervals around quantiles instead.

Your example is interesting because it’s all bounded distributions so you know the theoretical population minimum is 10^2 +5^2 + 25^2 = 750. You can create a fake population to check how well a sample of 1,000 captures the extremes of the population distribution:

In [2]: using Distributions

In [3]: population = [rand(Uniform(10, 20))^2 + rand(Uniform(5, 10))^2 + rand(Uniform(25, 50))^2 for _ ∈ 1:1_000_000];

In [4]: sample_mins = [minimum(rand(population, 1_000)) for _ ∈ 1:10_000];

In [5]: minimum(population), minimum(sample_mins)
(754.667335428114, 754.667335428114)

here I’ve created a population of size 1m and drawn 10,000 samples of size 1,000 from it. Indeed at least some of these sample include the population minimum. You can also look at the distribution of sample minimums to get a sense of how this estimator behaves, here are the 0.5th, 1st, 5th and 10th quantiles:

In [6]: quantile(sample_mins, [0.005, 0.01, 0.05, 0.1])
4-element Vector{Float64}:
 757.8221704938558
 759.6185780206479
 766.9632199757921
 771.232780534815

johnmyleswhite · May 16, 2023, 12:16am

You’re right - I had misunderstood the request. Statistical inference for population minimum and maximum are very messy: the asymptotic normality you get for interior quantiles doesn’t apply cleanly, so constructing CI’s is very hard. And the population minimum and maximum might not even exist without some assumptions, whereas the interior quantiles always do.

The topic is tricky enough that some of the top Google results are incorrect: for example, Max R. P. Grossmann: Distribution of the maximum of random variables involves confusion about what a confidence interval is.

Topic		Replies	Views
How to compute/calculate a Confidence Interval (CI)? General Usage statistics , distributions , hypothesis-tests	4	3051	May 16, 2023
Package for Confidence Intervals? Statistics question , statistics	19	4214	February 16, 2020
How do I get lower and upper bounds for a confidence band using LsqFit.jl? General Usage statistics , fit , curve-fitting , plot	8	1233	October 11, 2022
Packages to determine confidence intervals Statistics fit	1	606	June 12, 2020
Does Optim.jl calculate confidence interval? General Usage question , optim	17	1875	May 13, 2022

How to calculate confidence interval?

Related topics