I have a matrix with missing values and I want to calculate the column means. I’m not sure how to drop missing.

```
using Statistics
vec = [1, missing, 2]
mean(vec) # is missing
mean(skipmissing(vec) # give 1.5 that I want in 2 dim
# now I have a Matrix A with missing
A = [1 5
6 missing]
# unsure how to calculate column means
Statistics.mean(A, dims=1) # error for missing value
Statistics.mean(skipmissing(A), dims=1) # error
```

Everywhere I look I can only find vector examples… I don’t see any options in the mean function to drop missing in 2d. Any help would be appreciated.

2 Likes

`skipmissing`

returns a linear iterator (one single axis) without allocating a copy:

```
julia> skipmissing(A) |> collect |> size
(3,)
```

The `dims`

keyword does not work in that case because we have a single axis.

You can make an iterator that goes over each slice you care about (with `eachslice`

or `eachcol`

or `eachrow`

) and then broadcast `skipmissing`

and `mean`

on the resulting iterator of single-axis arrays.

```
mean.(skipmissing.(eachrow(A)))
```

There might be a cleaner way to do it. The underlying issue is that skipmissing can not return a multi-axis array because different rows might need a different number of skips due to a different number of `missing`

s.

3 Likes

Perfect this line worked:

```
mean.(skipmissing.(eachrow(A')))
```

(small edit: I needed column means so I do the transpose of my matrix of number of individuals by number (3k) of SNPs (45k))

Okay this was my worry, I hope the Statistics package improves this soon to deal with missing as this is more work than it needs to be (imo…). Thank you for your help!

1 Like

Thank you very much for you suggestion here, I will read this over now.

There is also `eachcol`

which gives an iterator over columns. It does not really matter whether you transpose or whether you switch from `eachrow`

to `eachcol`

.

Oh shoot, I tried `eachcolumn()`

and didn’t work. Thanks I’ll use `eachcol()`

Well, you lose some performance in `mean.(skipmissing.(eachcol(A)))`

compared to potential `mean(skipmissing(A), dims=1)`

, but the former is more general: substitute any aggregation instead of `mean`

and it’ll work, without special support by the function.

Anyway, there’s a long-stalled PR linked from the issue above (https://github.com/JuliaLang/julia/pull/28027), so you may wish to update/promote it if this feature seems important.

3 Likes

Oh I see… Thank you for this information. Well then they must be aware, I’m not much for development, I’m still trying to learn the basics. Julia is kind of a beast compared to R to learn. Thanks for all your help.

2 Likes