Compute mean of array where all values could be missing

I ran into this today:

using Statsbase
> ERROR: ArgumentError: reducing over an empty collection is not allowed
> Stacktrace:
>  [1] _empty_reduce_error()
>    @ Base ./reduce.jl:299
>  [2] reduce_empty(#unused#::typeof(+), #unused#::Core.TypeofBottom)
>    @ Base ./reduce.jl:310
>  [3] mapreduce_empty(#unused#::typeof(identity), op::Function, T::Type)
>    @ Base ./reduce.jl:343
> ...

The use case for me is that I have an array where sometimes all values are missing. So I have to do skipmissing first, then check whether it’s empty, and then call mean. Is there a way to avoid the branch?

FWIW, R returns NaN in this case.

1 Like

I don’t think there is a good solution for this at the moment, unfortunately.

NaN isn’t a great return type in this context, since mean might not just apply to numbers. You can take the mean of a Vector of Vectors, for example.

Maybe you can fix this upstream?

julia> t = Union{Float64, Missing}[missing, missing, missing]
3-element Vector{Union{Missing, Float64}}:

julia> mean(skipmissing(t))

Would this work?

y=[missing missing]
isempty(begin x=skipmissing(y) end) ? x=NaN : x=mean(x)


y=[1 3]
isempty(begin x=skipmissing(y) end) ? x=NaN : x=mean(x)

1 Like

Thanks! So

> NaN



gives the error. So I should make sure to set a union type before calling mean, and it should work. Great!

Regarding the return value – ideally it would be missing, no?

1 Like

No, I don’t think the return value would be missing. The return value should be the same as mean(Float64[]).

I guess so. Or maybe when you read in the data, you should just make sure that if a column is all missing, julia knows that those values could be Float64s.


Got it. That makes sense.

The vector is simulation output that I create myself, so I’ll just make sure that there is a type set when I create the vector. Thanks again for the quick help!