# Statistics.mean() function with a Matrix containing missing values

I have a matrix with missing values and I want to calculate the column means. I’m not sure how to drop missing.

``````using Statistics

vec = [1, missing, 2]
mean(vec)     # is missing
mean(skipmissing(vec)   # give 1.5 that I want in 2 dim

# now I have a Matrix A with missing
A = [1 5
6 missing]

# unsure how to calculate column means
Statistics.mean(A, dims=1)    # error for missing value
Statistics.mean(skipmissing(A), dims=1)   # error

``````

Everywhere I look I can only find vector examples… I don’t see any options in the mean function to drop missing in 2d. Any help would be appreciated.

2 Likes

`skipmissing` returns a linear iterator (one single axis) without allocating a copy:

``````julia> skipmissing(A) |> collect |> size
(3,)
``````

The `dims` keyword does not work in that case because we have a single axis.

You can make an iterator that goes over each slice you care about (with `eachslice` or `eachcol` or `eachrow`) and then broadcast `skipmissing` and `mean` on the resulting iterator of single-axis arrays.

``````mean.(skipmissing.(eachrow(A)))
``````

There might be a cleaner way to do it. The underlying issue is that skipmissing can not return a multi-axis array because different rows might need a different number of skips due to a different number of `missing`s.

3 Likes
3 Likes

Perfect this line worked:

``````mean.(skipmissing.(eachrow(A')))
``````

(small edit: I needed column means so I do the transpose of my matrix of number of individuals by number (3k) of SNPs (45k))
Okay this was my worry, I hope the Statistics package improves this soon to deal with missing as this is more work than it needs to be (imo…). Thank you for your help!

1 Like

Thank you very much for you suggestion here, I will read this over now.

There is also `eachcol` which gives an iterator over columns. It does not really matter whether you transpose or whether you switch from `eachrow` to `eachcol`.

Oh shoot, I tried `eachcolumn()` and didn’t work. Thanks I’ll use `eachcol()`

Well, you lose some performance in `mean.(skipmissing.(eachcol(A)))` compared to potential `mean(skipmissing(A), dims=1)`, but the former is more general: substitute any aggregation instead of `mean` and it’ll work, without special support by the function.

Anyway, there’s a long-stalled PR linked from the issue above (https://github.com/JuliaLang/julia/pull/28027), so you may wish to update/promote it if this feature seems important.

3 Likes

Oh I see… Thank you for this information. Well then they must be aware, I’m not much for development, I’m still trying to learn the basics. Julia is kind of a beast compared to R to learn. Thanks for all your help.

2 Likes