I have a matrix with missing values and I want to calculate the column means. I’m not sure how to drop missing.
using Statistics
vec = [1, missing, 2]
mean(vec) # is missing
mean(skipmissing(vec) # give 1.5 that I want in 2 dim
# now I have a Matrix A with missing
A = [1 5
6 missing]
# unsure how to calculate column means
Statistics.mean(A, dims=1) # error for missing value
Statistics.mean(skipmissing(A), dims=1) # error
Everywhere I look I can only find vector examples… I don’t see any options in the mean function to drop missing in 2d. Any help would be appreciated.
2 Likes
skipmissing
returns a linear iterator (one single axis) without allocating a copy:
julia> skipmissing(A) |> collect |> size
(3,)
The dims
keyword does not work in that case because we have a single axis.
You can make an iterator that goes over each slice you care about (with eachslice
or eachcol
or eachrow
) and then broadcast skipmissing
and mean
on the resulting iterator of single-axis arrays.
mean.(skipmissing.(eachrow(A)))
There might be a cleaner way to do it. The underlying issue is that skipmissing can not return a multi-axis array because different rows might need a different number of skips due to a different number of missing
s.
3 Likes
Perfect this line worked:
mean.(skipmissing.(eachrow(A')))
(small edit: I needed column means so I do the transpose of my matrix of number of individuals by number (3k) of SNPs (45k))
Okay this was my worry, I hope the Statistics package improves this soon to deal with missing as this is more work than it needs to be (imo…). Thank you for your help!
1 Like
Thank you very much for you suggestion here, I will read this over now.
There is also eachcol
which gives an iterator over columns. It does not really matter whether you transpose or whether you switch from eachrow
to eachcol
.
Oh shoot, I tried eachcolumn()
and didn’t work. Thanks I’ll use eachcol()
Well, you lose some performance in mean.(skipmissing.(eachcol(A)))
compared to potential mean(skipmissing(A), dims=1)
, but the former is more general: substitute any aggregation instead of mean
and it’ll work, without special support by the function.
Anyway, there’s a long-stalled PR linked from the issue above (Support mapreduce over dimensions with SkipMissing by nalimilan · Pull Request #28027 · JuliaLang/julia · GitHub), so you may wish to update/promote it if this feature seems important.
3 Likes
Oh I see… Thank you for this information. Well then they must be aware, I’m not much for development, I’m still trying to learn the basics. Julia is kind of a beast compared to R to learn. Thanks for all your help.
2 Likes