Mann-Whitney U Test vs. simple Bayesian inference

I’m looking at the Mann-Whitney U test from HypothesisTests.jl, and it is described as follows:

Perform a Mann-Whitney U test of the null hypothesis that the probability that an observation drawn from the same population as x is greater than an observation drawn from the same population as y is equal to the probability that an observation drawn from the same population as y is greater than an observation drawn from the same population as x against the alternative hypothesis that these probabilities are not equal.

I’m wondering if an equivalent kind of test couldn’t be done by simple Bayesian inference. Suppose I have two vectors:

using Distributions
using HypothesisTests
using Random
using StatsPlots
using Turing

Random.seed!(5)
u = rand(1:10, 50)
v = rand(2:11, 50)

If I call MannWhitneyUTest(u, v), it fails to reject H_0, with a two-side p-value of 0.1132. Given the way H_0 is stated, could I not just do this in the Bayesian paradigm as follows (it’s literally just the coin toss example from Turing.jl)?:

@model function dist_test(y)
    p ~ Beta(1, 1)
    y ~ filldist(Bernoulli(p), length(y))
    return y
end

model = dist_test([rand(u) > rand(v) for _ in 1:50])
chain = sample(model, NUTS(), 1_000)

In this case, the Bayesian approach doesn’t “agree” with the frequentist approach since 98%+ of our samples are below 0.5, but we “correctly” infer that the two samples are from different distributions.

I’m not attempting to initiate a debate about Frequentist vs. Bayesian as there’s plenty of material about that already. I’m just wondering if the Bayesian approach I’ve taken here is roughly equivalent to the approach using the Mann-Whitney U test, because it’s much easier for me to understand and explain the Bayesian approach.

1 Like

I can’t answer your question directly but this article might be of interest for you: https://www.tandfonline.com/doi/full/10.1080/02664763.2019.1709053.

2 Likes

Your proposed approach might work in cases with few ties, but it will be downwardly biased in the presence of many ties. For example, you wouldn’t want to conclude that the distributions are different if PR(X>Y) = 0.01, PR(X<Y) = 0.01, and PR(X=Y) = 0.98.

When the data are discrete and bounded, a straight-forward way to conduct such a test is to model the two vectors as categorical. Then, you can use the posterior draws to compare PR(X>Y) to PR(X<Y) without PR(X=Y) contaminating the results. This also motivates the Bayesian version, since you can apply whatever prior you want to the probabilities, including one that shrinks the two sets towards each other. You could add a smoother to the probabilities. etc.

using Plots
using StatsPlots
using Turing
using HypothesisTests

# Generate data; y is slightly higher than x
x = rand([1,1,2,3,4], 20)
y = rand([1,2,3,4,4], 20)

# Typical Mann-Whitney U test
MannWhitneyUTest(x, y)

# Bayesian model for category probabilities
# Assumes only categorical/multinomial
# distribution
@model function mann_whitney(x,y)
    px ~ Dirichlet([1,1,1,1])
    py ~ Dirichlet([1,1,1,1])

    x ~ Categorical(px)
    y ~ Categorical(py)
end

mw_mod = mann_whitney(x, y)
mw_results = sample(mw_mod, NUTS(1_000, 0.85), 1_000)

# Extract the category probabilities
px1 = Array(mw_results[["px[1]"]])
px2 = Array(mw_results[["px[2]"]])
px3 = Array(mw_results[["px[3]"]])
px4 = Array(mw_results[["px[4]"]])
py1 = Array(mw_results[["py[1]"]])
py2 = Array(mw_results[["py[2]"]])
py3 = Array(mw_results[["py[3]"]])
py4 = Array(mw_results[["py[4]"]])

# Probability that X and Y are at the same level (ties)
px_equal = px1 .* py1 +
           px2 .* py2 +
           px3 .* py3 +
           px4 .* py4
# Probability that X is higher than Y
px_higher = px2 .* py1 +
            px3 .* (py1 + py2) +
            px4 .* (py1 + py2 + py3)
# Probability that X is lower than Y
px_lower = px1 .* (py2 + py3 + py4) +
           px2 .* (py3 + py4) +
           py3 .* py4

# Summarize: PR(X>Y) - PR(X<Y)
higher_lower = px_higher - px_lower
mean(higher_lower)
std(higher_lower)
quantile(higher_lower, [0.025, 0.5, 0.975])
mean(higher_lower .> 0)

trace_plot = plot(higher_lower)
density_plot = plot(higher_lower, seriestype = "density")
plot(trace_plot, density_plot, layout = (1, 2), legend = false)
1 Like