HypothesisTests, rand and very low pvalue

fpoling · December 21, 2022, 9:33am

Consider the following code:

using Distributions
import HypothesisTests as HT
HT.pvalue(HT.ApproximateOneSampleKSTest(rand(Normal(), 1_000_000), Normal()))

I expected that the answer will be very close to one, but depending on the run it can be anywhere from 0.01 to 0.98. Why it is so? The same is with OneSampleADTest and other tests.

Using DiscreteUniform is even worse.

HT.pvalue(HT.OneSampleADTest(rand(DiscreteUniform(1, 10), 1_000_000), DiscreteUniform(1, 10)))

routinely gives values like 6e-10.

So what is wrong here?

sijo · December 21, 2022, 9:53am

I don’t think anything is wrong: when the null hypothesis is true, the p-value is uniformly distributed. So you have in particular 1% chance of observing a p-value <= 0.01, and 2% of observing something >= 98%.

For example, repeating your experiment 10000 times (with a much smaller sample size to avoid long computations, but this doesn’t change the result):

using UnicodePlots
using Distributions
import HypothesisTests as HT

get_p() = HT.pvalue(HT.ApproximateOneSampleKSTest(rand(Normal(), 1000), Normal()))
p = [get_p() for _ in 1:10_000]

histogram(p)

# Output:
              ┌                                        ┐ 
   [0.0, 0.1) ┤█████████████████████████████▍ 956        
   [0.1, 0.2) ┤███████████████████████████████▌ 1 025    
   [0.2, 0.3) ┤█████████████████████████████▉ 977        
   [0.3, 0.4) ┤██████████████████████████████▏ 978       
   [0.4, 0.5) ┤████████████████████████████▉ 943         
   [0.5, 0.6) ┤███████████████████████████████▌ 1 025    
   [0.6, 0.7) ┤██████████████████████████████▏ 979       
   [0.7, 0.8) ┤██████████████████████████████▊ 1 002     
   [0.8, 0.9) ┤███████████████████████████████▉ 1 040    
   [0.9, 1.0) ┤█████████████████████████████████  1 075  
              └                                        ┘

The histogram is approximtely flat as expected, because all values of the p-value are equally likely.

Topic		Replies	Views
Kolmogorov-Smirnov test Statistics distributions , hypothesis-tests	22	2943	April 30, 2025
How do I find a p value with HypothesisTests.jl? Statistics question , package , statistics , type	8	1676	May 8, 2021
Anderson-Darling test pvalue Statistics question , hypothesis-tests	5	1541	February 23, 2022
Testing whether data come from a Generalized Pareto Distribution Statistics	6	627	July 25, 2023
How do I extract more than a pvalue HypothesisTests.jl? Statistics question , package , hypothesis-tests	1	120	September 30, 2024

HypothesisTests, rand and very low pvalue

Related topics