Hi!

I’m facing the following problem:

```
using DataFrames, Statistics
df = DataFrame(A = [1, 2, missing, missing, missing, 3, 4, 5],
B = [1, 1, 2, 2, 3, 3, 4, 4])
df_mean = aggregate(df, :B, x -> mean(skipmissing(x)))
```

The mean function returns `NaN`

when using `skipmissing`

and all the observations in that group are `missing`

. Is there a way to change this behaviour so that it returns `missing`

as well?

Thank you!

The reasoning for this behavior is that a skipmissing of a vector `Int[missing, missing]`

should be have kind of the same behavior as an empty `Int`

vector, `Int[]`

. Mean is just `mean(x) = sum(x) / length(x)`

so it’s clear that `mean(Int[])`

should return NaN.

Are you coming from Stata, by chance? Julia’s behavior mimics R’s, but Stata propagates `missing`

they way you expect it to.

The best approach would be to make a little helper function

```
meanmissing(x) = all(ismissing, x) ? missing : mean(skipmissing(x))
```

3 Likes

This is exactly what I was looking for! Thanks a lot. I’m coming from Stata and Pandas, which in this case seem to behave alike.