Kolmogorov-Smirnov test

Looking at this more closely, I don’t think it’s a problem with the choice of tails, but rather in the computation of the KS statistic.

See in comparison to the implementation in scipy.stats

HypothesisTests.jl:

julia> print(x)
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5]
julia> print(y)
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 5, 8]
julia> t = ApproximateTwoSampleKSTest(x, y)
┌ Warning: This test is inaccurate with ties
└ @ HypothesisTests ~/.julia/packages/HypothesisTests/BgrVj/src/kolmogorov_smirnov.jl:167
Approximate two sample Kolmogorov-Smirnov test
----------------------------------------------
Population details:
    parameter of interest:   Supremum of CDF differences
    value under h_0:         0.0
    point estimate:          0.516129

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           0.0005

Details:
    number of observations:   [31,31]
    KS-statistic:              2.032002032003047

vs scipy stats

In [13]: stats.kstest(x, y, alternative="two-sided", method="asymp")
Out[13]: KstestResult(statistic=0.032258064516129115, pvalue=1.0)

In [14]: stats.kstest(x, y, alternative="greater", method="asymp")
Out[14]: KstestResult(statistic=0.032258064516129115, pvalue=0.9375209928337668)

In [15]: stats.kstest(x, y, alternative="less", method="asymp")
Out[15]: KstestResult(statistic=0.032258064516129004, pvalue=0.9375209928337671)
1 Like