`NaN` in Statistical Hypothesis Testing

Sahil_Khan · September 29, 2023, 8:44am

How can we ignore missing values or NaN during testing?

A = [0.1, NaN, 0.16]
B = [1.0, 0.9, 0.82]

EqualVarianceTTest(A, B)

output:

Two sample t-test (equal variance)
----------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          NaN
    95% confidence interval: (NaN, NaN)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           NaN

Details:
    number of observations:   [3,3]
    t-statistic:              NaN
    degrees of freedom:       4
    empirical standard error: NaN

Sukera · September 29, 2023, 11:55am

(I don’t know the answer to your question, but I adjusted the title a bit to make it clear to others that this is a question about statistical testing, not unit testing!)

aplavin · September 29, 2023, 1:15pm

Easy to ignore whatever values you don’t want to use: EqualVarianceTTest(filter(!isnan, A), B).

If your observations in A and B are paired and you want to ignore corresponding elements from both (not only remove A[2] that is NaN but also remove B[2]), then it’s most convenient to put these arrays into a single columnar table. In Julia, it’s effectively free, and you retain the same familiar array interface:

using StructArrays
# best to keep both A and B together from the beginning, if they are paired
data = StructArray(; A, B)

data = filter(x -> !any(isnan, x), data)
EqualVarianceTest(data.A, data.B)

Topic		Replies	Views
Ignoring NaN in elementwise aggregations General Usage question	7	6243	December 20, 2019
Is there any reason to use NaN instead of missing? General Usage missing-values	11	3716	July 19, 2022
How to extract pvalue? New to Julia question , statistics	1	572	August 29, 2023
How to single out valid data from DataArray? Data question	4	2238	June 8, 2017
One-pass NaN variance Performance	3	494	July 25, 2020

`NaN` in Statistical Hypothesis Testing

Related topics