`NaN` in Statistical Hypothesis Testing

How can we ignore missing values or NaN during testing?

A = [0.1, NaN, 0.16]
B = [1.0, 0.9, 0.82]

EqualVarianceTTest(A, B)

output:

Two sample t-test (equal variance)
----------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          NaN
    95% confidence interval: (NaN, NaN)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           NaN

Details:
    number of observations:   [3,3]
    t-statistic:              NaN
    degrees of freedom:       4
    empirical standard error: NaN

(I don’t know the answer to your question, but I adjusted the title a bit to make it clear to others that this is a question about statistical testing, not unit testing!)

1 Like

Easy to ignore whatever values you don’t want to use: EqualVarianceTTest(filter(!isnan, A), B).

If your observations in A and B are paired and you want to ignore corresponding elements from both (not only remove A[2] that is NaN but also remove B[2]), then it’s most convenient to put these arrays into a single columnar table. In Julia, it’s effectively free, and you retain the same familiar array interface:

using StructArrays
# best to keep both A and B together from the beginning, if they are paired
data = StructArray(; A, B)

data = filter(x -> !any(isnan, x), data)
EqualVarianceTest(data.A, data.B)
4 Likes