Statistics.mean() function with a Matrix containing missing values

austin-putz · February 6, 2023, 3:08am

I have a matrix with missing values and I want to calculate the column means. I’m not sure how to drop missing.

using Statistics

vec = [1, missing, 2]
mean(vec)     # is missing
mean(skipmissing(vec)   # give 1.5 that I want in 2 dim

# now I have a Matrix A with missing
A = [1 5
       6 missing]

# unsure how to calculate column means
Statistics.mean(A, dims=1)    # error for missing value
Statistics.mean(skipmissing(A), dims=1)   # error

Everywhere I look I can only find vector examples… I don’t see any options in the mean function to drop missing in 2d. Any help would be appreciated.

Krastanov · February 6, 2023, 3:28am

skipmissing returns a linear iterator (one single axis) without allocating a copy:

julia> skipmissing(A) |> collect |> size
(3,)

The dims keyword does not work in that case because we have a single axis.

You can make an iterator that goes over each slice you care about (with eachslice or eachcol or eachrow) and then broadcast skipmissing and mean on the resulting iterator of single-axis arrays.

mean.(skipmissing.(eachrow(A)))

There might be a cleaner way to do it. The underlying issue is that skipmissing can not return a multi-axis array because different rows might need a different number of skips due to a different number of missings.

briochemc · February 6, 2023, 4:04am

Related issue: sum and mean of skipmissings don't accept the dims kwarg · Issue #40081 · JuliaLang/julia · GitHub

austin-putz · February 6, 2023, 4:58pm

Perfect this line worked:

mean.(skipmissing.(eachrow(A')))

(small edit: I needed column means so I do the transpose of my matrix of number of individuals by number (3k) of SNPs (45k))
Okay this was my worry, I hope the Statistics package improves this soon to deal with missing as this is more work than it needs to be (imo…). Thank you for your help!

austin-putz · February 6, 2023, 4:58pm

Thank you very much for you suggestion here, I will read this over now.

Krastanov · February 6, 2023, 5:54pm

There is also eachcol which gives an iterator over columns. It does not really matter whether you transpose or whether you switch from eachrow to eachcol.

austin-putz · February 6, 2023, 6:54pm

Oh shoot, I tried eachcolumn() and didn’t work. Thanks I’ll use eachcol()

aplavin · February 6, 2023, 9:03pm

Well, you lose some performance in mean.(skipmissing.(eachcol(A))) compared to potential mean(skipmissing(A), dims=1), but the former is more general: substitute any aggregation instead of mean and it’ll work, without special support by the function.

Anyway, there’s a long-stalled PR linked from the issue above (Support mapreduce over dimensions with SkipMissing by nalimilan · Pull Request #28027 · JuliaLang/julia · GitHub), so you may wish to update/promote it if this feature seems important.

austin-putz · February 6, 2023, 9:52pm

Oh I see… Thank you for this information. Well then they must be aware, I’m not much for development, I’m still trying to learn the basics. Julia is kind of a beast compared to R to learn. Thanks for all your help.

Topic		Replies	Views
DataFrames, aggregate with missings Data dataframes	2	560	May 4, 2020
How to calculate a weighted mean with missing observations Statistics	17	5028	January 5, 2019
Compute mean of array where all values could be missing New to Julia	5	393	April 21, 2021
RE: Weighted Statistics with Missings Statistics dataframes	19	735	December 13, 2023
Dipatch on AbstractArray which contains missing values New to Julia	4	247	July 1, 2022

Statistics.mean() function with a Matrix containing missing values

Related topics