How to calculate confidence interval?

Dear All,

I have a random sample, let’s say, smpl = {1, 4, 2, 6, 10, 18, 3, 5, 20}. I wish to calculate a 99% confidence interval on [minimum(smpl), maximum(smpl)]=[1, 20]. I am able to calculate a 99% confidence interval for a mean(smpl), but don’t know how to do it for the lower and upper values of a sample. Could you please help me to solve it? In the actual sample, I have more than 1000 elements. I am using Distributions.jl, IntervalArithmetic.jl, HypothesisTests.jl packages in Pluto.jl notebook.

Thank you in advance

https://juliastats.org/HypothesisTests.jl/stable/methods/#Confidence-interval

When you say “don’t know how to do it for the lower and upper values of a sample” I suppose you mean you want to construct a confidence interval on the population minimum and maximum from a sample? If that’s the case then that won’t be possible in general, so you might need to provide some more context for what you’re actually after.

Yes @nilshg, I want to construct a confidence interval on the population minimum and maximum from a sample. An example is shown below-

using Distributions.jl, IntervalArithmetic.jl, HypothesisTests.jl
       
        N = 1000
	
		a = rand(Uniform(10, 20), N);
		b = rand(Uniform(5, 10), N);
		c = rand(Uniform(25, 50), N);
	
		sample = a.^2 + b.^2 + c.^2

		limits = [minimum(sample), maximum(sample)]	

I am looking for the 95% or 99% confidence interval on limits=[772.775, 2889.33] value.

Okay, so I think what you’re trying to do doesn’t work (although maybe @johnmyleswhite has a clever idea, his certainly a better statistician than me!)

The population minimum (or maximum) is a unique (for a continuous distribution) value and as such a sample in and of itself without some distributional assumptions can’t really tell you much about it - the sample either includes the minimum or it doesn’t, and you don’t know which it is.

You could look into quantile estimators and confidence intervals around quantiles instead.

Your example is interesting because it’s all bounded distributions so you know the theoretical population minimum is 10^2 +5^2 + 25^2 = 750. You can create a fake population to check how well a sample of 1,000 captures the extremes of the population distribution:

In [2]: using Distributions

In [3]: population = [rand(Uniform(10, 20))^2 + rand(Uniform(5, 10))^2 + rand(Uniform(25, 50))^2 for _ ∈ 1:1_000_000];

In [4]: sample_mins = [minimum(rand(population, 1_000)) for _ ∈ 1:10_000];

In [5]: minimum(population), minimum(sample_mins)
(754.667335428114, 754.667335428114)

here I’ve created a population of size 1m and drawn 10,000 samples of size 1,000 from it. Indeed at least some of these sample include the population minimum. You can also look at the distribution of sample minimums to get a sense of how this estimator behaves, here are the 0.5th, 1st, 5th and 10th quantiles:

In [6]: quantile(sample_mins, [0.005, 0.01, 0.05, 0.1])
4-element Vector{Float64}:
 757.8221704938558
 759.6185780206479
 766.9632199757921
 771.232780534815

You’re right - I had misunderstood the request. Statistical inference for population minimum and maximum are very messy: the asymptotic normality you get for interior quantiles doesn’t apply cleanly, so constructing CI’s is very hard. And the population minimum and maximum might not even exist without some assumptions, whereas the interior quantiles always do.

The topic is tricky enough that some of the top Google results are incorrect: for example, Max R. P. Grossmann: Distribution of the maximum of random variables involves confusion about what a confidence interval is.

1 Like