Fisher's test p-value results appear to differ from matlab, R


#1

Hi,
I was using HypothesisTests.jl and noticed minor, but possibly important differences in the p-value estimate between julia and two other packages when using Fisher’s exact test. In this example, matlab and R users might claim a significant result, whereas the julia user might not. This is a fairly common test, and it would be great if there was agreement. Thanks for considering this comment for discussion.
Scott

julia (v0.6):

julia> FisherExactTest(59, 335, 172, 1366)
Fisher's exact test
-------------------
Population details:
    parameter of interest:   Odds ratio
    value under h_0:         1.0
    point estimate:          1.3984544219625261
    95% confidence interval: (0.9980930945998796, 1.9393947540537153)

Test summary:
    outcome with 95% confidence: fail to reject h_0
    **two-sided p-value:           0.051329212328076565    <------------------------**

Details:
    contingency table:
         59   335
        172  1366

matlab:

>> x = table([59;172],[335;1366])
x =
  2×2 table
    Var1    Var2
    ____    ____
     59      335
    172     1366

>> [h,p,stats]=fishertest(x,'Tail','both','Alpha',0.95)

h =
  logical
   1

**p =**
**   0.045036387203992    <--------------------------------**

stats = 
  struct with fields:
             OddsRatio: 1.398715723707046
    ConfidenceInterval: [1.384515635855077 1.413061452741955]

R:

> x = matrix(c(59,172,335,1366), nrow = 2)
> x
     [,1] [,2]
[1,]   59  335
[2,]  172 1366
> fisher.test(x,alternative = "two.sided")

	Fisher's Exact Test for Count Data

data:  x
**p-value = 0.04503639                   <------------------------------**
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.9980904309 1.9393926010
sample estimates:
 odds ratio 
1.398453903

#2

I’m afraid I can’t help with the statistics question, but it will be easier for others to read your post and help you if you quote your code with backticks.


#3

Might look into it, but from a quick look, there are adjustments to the test for GLM such as those for count data which lead to the proper test being one-tailed. It might be that software usually take into account those cases as opposed to Julia which might not be aware that it is a count model and the user requests a two-tail version.


#4

For asymmetric distributions, p-values for two-sided alternatives are not well defined. It has been discussed couple of times before. See https://github.com/JuliaStats/HypothesisTests.jl/issues/98#issuecomment-301532196 and the reference there. See also the doc string for the pvalue method which states that

For tail = :both, possible values for method are:

    •    :central (default): Central interval, i.e. the p-value is two times the minimum of the
        one-sided p-values.

    •    :minlike: Minimum likelihood interval, i.e. the p-value is computed by summing all
        tables with the same marginals that are equally or less probable:

          p_ω = \sum_{f_ω(i)≤ f_ω(a)} f_ω(i)

so you can get the p-value used by R by specifying method, i.e.

julia> pvalue(x, method = :minlike)
0.04503638720401171

This value is not more correct than the other one.


#5

I’m curious, does R or Matlab give the option of calculating the :central value that is the default for Julia?


#6

Robin, thanks for the advice on posting. Jose, thanks for advice on modeling. And Andreas, thank you for a great explanation. I was easily able to reproduce the matlab/R results. For a newbie like me who just used the inline REPL help, an additional sentence in the REPL help giving a little more detail on how alternative p-values can be obtained would be helpful!


#7

R does (my first post, I hope I got it right!):

> library(exact2x2)
> x = matrix(c(59,172,335,1366), nrow = 2)
> x
     [,1] [,2]
[1,]   59  335
[2,]  172 1366
> 
> exact2x2(x,tsmethod="central")

	Central Fisher's Exact Test

data:  x
p-value = 0.05133
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.9980904 1.9393926
sample estimates:
odds ratio 
  1.398454