What's wrong with my chi-squared goodness of fits tests?

Hi there. I am following a MOOC where I am learning goodness of fits tests, in particular the chi-square test for discrete rv and the Kolmogorov-Smirnov test for continuous rv.

I implemented the tests in the following code. It gives results that are consistent with the output of the exercises in the course, however when I simulate the test on true H0 even with LOT of repetitions, I have very low type 1 errors, much lower than the alpha level set in the test, except I am working for uniforms for which the tests accept (wrongly) H1 with a rate alpha.

Here the difinition of the tests…

using Distributions

# The test definitions.....
function computeDensity(data,support)
    counts =  [count(i -> i==s,data) for s in support]
    if length(data) > sum(counts)
        error("There are some data not in the support !")
    end
    return counts
end

""" 
    goodnessOfFitDiscrete(data,support,f₀;compressedData=true,α=0.05,d=0)

Perform a goodness to fit chi-squared test to check for a particular MMF.

The passed distribution must support the method `pdf(dist,x)` for the provided support.
Hâ‚€ can be either the PDF with a specified set of parameters or the PDF in general. In this case the distribution object should be passed to this function with the MLE estimators that best fit the data (it is NOT done inside this function). In such case  the `d` parameter should be set to the number of estimated parameters in order to remove the `d` degree of freedom from the chi-square test.
"""
function goodnessOfFitDiscrete(data,support,f₀;compressedData=true,α=0.05,d=0)
    if !compressedData
        data   = computeDensity(data,support)
    end
    K          = length(support)
    N          = sum(data)
    p̂          = data ./ N
    df         = K - d - 1
    p0         = pdf.(fâ‚€,support)
    T          = N * sum((p̂[k] - p0[k])^2/p0[k] for k in 1:K)
    χDist      = Chisq(df)
    rejectedH₀ = T > quantile(χDist,1-α)
    p_value    = 1 - cdf(χDist,T)
    return (testValue=T, threashold=quantile(χDist,1-α),rejectedH₀=rejectedH₀, p_value=p_value)
end

When I use the Uniform distribution, I obtain the expected number of errors:

fâ‚€          = DiscreteUniform(1,100)
repetitions = 5000
support     = 1:100
outs        = fill(false,repetitions)
for rep in 1:repetitions
    data      = rand(fâ‚€,10000)
    out       = goodnessOfFitDiscrete(data,support,f₀,compressedData=false,α=0.05,d=0)
    outs[rep] = out.rejectedHâ‚€
end
sum(outs)/repetitions # around 0.05

However when I use an other distribution, even a simmetric and unimodal one like the Binomial, I have much lower numbers:

fâ‚€          = Binomial(100,0.7)
repetitions = 5000
support     = 0:100
outs        = fill(false,repetitions)
for rep in 1:repetitions
    data      = rand(fâ‚€,10000)
    out       = goodnessOfFitDiscrete(data,support,f₀,compressedData=false,α=0.05,d=0)
    outs[rep] = out.rejectedHâ‚€
end
sum(outs)/repetitions # 0.02 or 0.03 instead of 0.05

Crazy enough, if I did try to increase alpha to e.g. 0.1 or 0.2, I can see that the output of the test when using the uniform adapt and match the new alpha, but for the binomial it remains to very low levels… even if I set my test to reject H0 even with a very high error probability, it almost never does!

Is there an “official” implementation of the chi-square test for goodness of fit I can try with ?

It seems I am giving the same results as the test on HyphothesisTesting.jl package:

But this means that the power of the test, in the worst case of the true parameter different but very close to the H0 one, is lower than the expected one, i.e. the test is more conservative than what theoretically should be…

If I remember correctly, the test assumes an expected frequency of 5 or more for 80% or more categories. Does that hold in your simulation?

1 Like

I wasn’t aware of that heuristics.
My Binomial(100,0.7) on x <= 19 has a PMF of just 6.689300172065137e-26 or less, so even if I sample 2000 observations I will likely have no observations on x <= 19.

Thanks for the info!

No problem. In that case, a common solution is to combine categories so that the assumptions of the test are satisfied. The sources I found were not clear, but correcting the categories might get you closer to the theoretical alpha of .05.

1 Like

Additional details

The chi-square test can only be used if the distribution of testValues is approximated by the chi-square distribution.

In the standard goodness-of-fit test, the degree of freedom of the chi-square distribution should be equal to (the number of elements in the support of the discrete distribution to be tested) - 1.

However, the real (or substantial) support of Binomial(100, 0.7) is at most between 40 and 100. So, in this case, we can expect that the degree of freedom should be roughly around 60.

Let’s check this in the following.

The mathematically idealized support of DiscreteUniform(0, 100) and Binomial(100, 0.7) are both 0:100, but the real support of the latter is much smaller than the former.

using Distributions, StatsPlots
P = plot(DiscreteUniform(0, 100); label="DiscreteUniform(0, 100)", xtick=0:10:100, ylim=(0, 0.013))
Q = plot(Binomial(100, 0.7); label="Binomial(100, 0.7)", legend=:topleft, xtick=0:10:100)
plot(P, Q; size=(800, 250))

using Distributions, Random, StatsPlots

# The test definitions.....
function computeDensity(data, supp)
    counts =  [count(i -> i==s,data) for s in supp]
    if length(data) > sum(counts)
        error("There are some data not in the support !")
    end
    return counts
end

"""
Modified version of `goodnessOfFitDiscrete` function in
https://discourse.julialang.org/t/whats-wrong-with-my-chi-squared-goodness-of-fits-tests/64334

* The old argument `support` is deleted.
* The new local variable `supp` is defined to be `support(fâ‚€)`
"""
function goodnessOfFitDiscrete(data, f₀; compressedData=true, α=0.05, d=0)
    supp       = support(fâ‚€)
    if !compressedData
        data   = computeDensity(data, supp)
    end
    K          = length(supp)
    N          = sum(data)
    p̂          = data ./ N
    df         = K - d - 1
    p0         = pdf.(fâ‚€,supp)
    T          = N * sum((p̂[k] - p0[k])^2/p0[k] for k in 1:K)
    χDist      = Chisq(df)
    rejectedH₀ = T > quantile(χDist, 1-α)
    p_value    = 1 - cdf(χDist, T)
    return (testValue=T, threashold=quantile(χDist,1-α), rejectedH₀=rejectedH₀, p_value=p_value)
end

function repeat_tests(f₀; datasize = 10000, repetitions = 10000, α = 0.05, d = 0)
    testValue = zeros(repetitions)
    rejectedHâ‚€ = falses(repetitions)
    data = rand(fâ‚€, datasize)
    for rep in 1:repetitions
        rand!(fâ‚€, data)
        out = goodnessOfFitDiscrete(data, f₀; compressedData=false, α, d)
        testValue[rep] = out.testValue
        rejectedHâ‚€[rep] = out.rejectedHâ‚€
    end
    α_real = sum(rejectedH₀)/repetitions
    (; α_real, testValue, rejectedH₀)
end

function plot_testValue_dist(f₀; datasize = 10000, repetitions = 10000, α = 0.05, d=0)
    α_real, testValue, rejectedH₀ = repeat_tests(f₀; datasize, repetitions, α, d)

    title = "$f₀   (real α = $α_real / nominal α = $α)"
    plot(; title, titlefontsize=10)

    xlim=(0, 2length(support(fâ‚€)))
    a, b = xlim
    histogram!(testValue[a .≤ testValue .≤ b]; norm=true, alpha=0.3, label="testValue", xlim)

    df = length(support(fâ‚€)) - d - 1
    plot!(Chisq(df), a, b; label="Chisq(df = $df)", ls=:dash, lw=2)
end

In the DiscreteUniform(0, 100) case, the distribution of testValues is almost exactly equal to the chi-square distribution.

plot_testValue_dist(DiscreteUniform(0, 100))

2021-07-10 (1)

But, in the Binomial(100, 0.7) case, this is not the case. The real (or empirical) α value is much smaller than the nominal α value.

plot_testValue_dist(Binomial(100, 0.7))

2021-07-10 (2)

When the degree of freedom is set to 60, the empirical α value is close to the nominal α value 0.05.

plot_testValue_dist(Binomial(100, 0.7); d = 40)

3 Likes