Hypothesis testing in Julia

mihrits · March 12, 2022, 9:49am

I am taking a statistical data science course that is taught using R, and I am trying to replicate all the practical things in Julia. As we are doing quite basic stuff, mostly everything has been quite straightforward and similar to R, but I have been having some problems with hypothesis testing.

The package I am using is HypothesisTests.jl.

I don’t exactly understand the implementation of ChisqTest in that package. I get that if I want the goodness-of-fit test I have to provide a vector x and theta0 and that works as expected. But for the contingency table test, it seems to only accept x as a matrix and not x and y (for example ChisqTest([50,100,50], [50,100,50])) as it would seem from the docs (link). The error message shows that the closest candidates are

ChisqTest(::AbstractVector{T}, ::AbstractVector{T}, ::Tuple{UnitRange{T}, UnitRange{T}})
ChisqTest(::AbstractVector{T}, ::AbstractVector{T}, ::T)

I don’t really understand where these come from or what values I should use there. I tried ChisqTest([50,100,50], [50,100,50], (1:3,1:3)) and ChisqTest([50,100,50], [50,100,50], 3), but both give an error “ArgumentError: at least one entry must be positive”, which confuses me. So I guess the main question here is, what is the argument y for? I thought I could use it for a contingency table test with two vectors, but I might be wrong.

There is prop.test (link) for testing probabilities/proportions in R. RDocumentation doesn’t really explain what is the test behind it. Googling has led me to believe that it is a z-test for proportions, which, if I understand correctly, isn’t available in the HypothesisTests.jl package. For a simpler case of comparing two proportions, I tried making a vectors of ones and zeros with correct proportions and then applied a two-sample z-test, but that didn’t yield the same result, so I don’t think that is the correct workaround. The R version gives chisq value for the test and I tried ChisqTest with contingency table, which gave a much more similar p-value to the R function, but not the same, and as I’ve understood, that should not be the correct approach. Any suggestions on how to replicate the R’s prop.test in Julia would be appreciated.

nalimilan · March 14, 2022, 8:03am

Regarding the chi-squared test, can you describe what your vectors are? The three-argument ChisqTest method expects x and y to be vectors with one value for each independent observations, and it computes the contingency table from that. The third argument to give the possible values for x and y. But it only supports ranges (so values must be contiguous), and is undocumented. This would give for example ChisqTest([50,100,50], [50,100,50], (50:100, 50:100)). We should probably fix this by not requiring the third argument, feel free to file an issue in GitHub.
If you already have the counts, then you can put them in a matrix, like ChisqTest([x y]). AFAICT this is the same in R, isn’t it?
prop.test in R is a Binomial test with the Wilson approximation. R also supports the Clopper–Pearson variant via binom.test. Both are supported via BinomialTest in HypothesisTests, see ?confint for details about supported variants.

mihrits · March 14, 2022, 1:51pm

Thank you for the reply!

The example I gave was indeed meant to be the counts in vectors. And constructing a matrix from the vectors works. As I saw from the docs that it was possible to use two vectors, I thought I could do so with counts, but apparently misinterpreted the docs. Nevertheless, I now also tried the three-argument method but found that it expects the vectors to be of the same length. It should be possible to compute a contingency table from samples of different sizes, right? This seemed strange, but maybe I am missing something.
Thanks for the lead on the Wilson approximation. I forgot to mention that I was thinking of the usage of the prop.test for comparing two samples, basically two proportions. I am not sure how to do that with BinomialTest if it is possible. After some more googling I found that I could replicate the result from R without the continuity correction when calculating the z-statistic described here: https://online.stat.psu.edu/stat800/lesson/5/5. And then finding the two-tailed p-value with it. Is it possible to calculate the hypothesis test for two sample proportions?

nalimilan · March 14, 2022, 7:46pm

By definition a contingency table crosses values of variables from a single sample. Maybe you can reformulate your data as a single variable crossed with a variable indicating which sample each value comes from?

Unfortunately we don’t support two sample binomial tests for now, though there’s a PR open for that. If you don’t want to use the code from that PR, you could use Fisher’s exact test instead.

mihrits · March 14, 2022, 8:29pm

I am not that familiar with the chi-squared test besides some practical examples, so I was just toying around with it. Thank you for the answers!

Topic		Replies	Views
Help with ChisqTest New to Julia question , statistics	6	2554	March 2, 2020
Entering xlsx columns into HypothosisTests Statistics gettingstarted	26	1465	February 27, 2020
[ANN] PermutationTests.jl, a package for multiple hypothesis testing Package Announcements package , announcement , statistics , hypothesis-tests	5	401	July 15, 2024
ANOVA Tests in Julia? Statistics	76	14012	August 11, 2022
Fisher's test p-value results appear to differ from matlab, R Statistics	6	3512	May 1, 2018

Hypothesis testing in Julia

Related topics