# Hypothesis testing in Julia

I am taking a statistical data science course that is taught using R, and I am trying to replicate all the practical things in Julia. As we are doing quite basic stuff, mostly everything has been quite straightforward and similar to R, but I have been having some problems with hypothesis testing.

The package I am using is HypothesisTests.jl.

1. I don’t exactly understand the implementation of `ChisqTest` in that package. I get that if I want the goodness-of-fit test I have to provide a vector `x` and `theta0` and that works as expected. But for the contingency table test, it seems to only accept `x` as a matrix and not `x` and `y` (for example `ChisqTest([50,100,50], [50,100,50])`) as it would seem from the docs (link). The error message shows that the closest candidates are
``````ChisqTest(::AbstractVector{T}, ::AbstractVector{T}, ::Tuple{UnitRange{T}, UnitRange{T}})
ChisqTest(::AbstractVector{T}, ::AbstractVector{T}, ::T)
``````

I don’t really understand where these come from or what values I should use there. I tried `ChisqTest([50,100,50], [50,100,50], (1:3,1:3))` and `ChisqTest([50,100,50], [50,100,50], 3)`, but both give an error “ArgumentError: at least one entry must be positive”, which confuses me. So I guess the main question here is, what is the argument `y` for? I thought I could use it for a contingency table test with two vectors, but I might be wrong.

1. There is `prop.test` (link) for testing probabilities/proportions in R. RDocumentation doesn’t really explain what is the test behind it. Googling has led me to believe that it is a z-test for proportions, which, if I understand correctly, isn’t available in the `HypothesisTests.jl` package. For a simpler case of comparing two proportions, I tried making a vectors of ones and zeros with correct proportions and then applied a two-sample z-test, but that didn’t yield the same result, so I don’t think that is the correct workaround. The R version gives chisq value for the test and I tried `ChisqTest` with contingency table, which gave a much more similar p-value to the R function, but not the same, and as I’ve understood, that should not be the correct approach. Any suggestions on how to replicate the R’s `prop.test` in Julia would be appreciated.
1 Like
1. Regarding the chi-squared test, can you describe what your vectors are? The three-argument `ChisqTest` method expects `x` and `y` to be vectors with one value for each independent observations, and it computes the contingency table from that. The third argument to give the possible values for `x` and `y`. But it only supports ranges (so values must be contiguous), and is undocumented. This would give for example `ChisqTest([50,100,50], [50,100,50], (50:100, 50:100))`. We should probably fix this by not requiring the third argument, feel free to file an issue in GitHub.
If you already have the counts, then you can put them in a matrix, like `ChisqTest([x y])`. AFAICT this is the same in R, isn’t it?

2. `prop.test` in R is a Binomial test with the Wilson approximation. R also supports the Clopper–Pearson variant via `binom.test`. Both are supported via `BinomialTest` in HypothesisTests, see `?confint` for details about supported variants.

1 Like

Thank you for the reply!

1. The example I gave was indeed meant to be the counts in vectors. And constructing a matrix from the vectors works. As I saw from the docs that it was possible to use two vectors, I thought I could do so with counts, but apparently misinterpreted the docs. Nevertheless, I now also tried the three-argument method but found that it expects the vectors to be of the same length. It should be possible to compute a contingency table from samples of different sizes, right? This seemed strange, but maybe I am missing something.

2. Thanks for the lead on the Wilson approximation. I forgot to mention that I was thinking of the usage of the `prop.test` for comparing two samples, basically two proportions. I am not sure how to do that with `BinomialTest` if it is possible. After some more googling I found that I could replicate the result from R without the continuity correction when calculating the z-statistic described here: https://online.stat.psu.edu/stat800/lesson/5/5. And then finding the two-tailed p-value with it. Is it possible to calculate the hypothesis test for two sample proportions?

By definition a contingency table crosses values of variables from a single sample. Maybe you can reformulate your data as a single variable crossed with a variable indicating which sample each value comes from?

Unfortunately we don’t support two sample binomial tests for now, though there’s a PR open for that. If you don’t want to use the code from that PR, you could use Fisher’s exact test instead.

1 Like

I am not that familiar with the chi-squared test besides some practical examples, so I was just toying around with it. Thank you for the answers!