I ran into this today:
using Statsbase
mean(skipmissing([missing]))
> ERROR: ArgumentError: reducing over an empty collection is not allowed
> Stacktrace:
> [1] _empty_reduce_error()
> @ Base ./reduce.jl:299
> [2] reduce_empty(#unused#::typeof(+), #unused#::Core.TypeofBottom)
> @ Base ./reduce.jl:310
> [3] mapreduce_empty(#unused#::typeof(identity), op::Function, T::Type)
> @ Base ./reduce.jl:343
> ...
The use case for me is that I have an array where sometimes all values are missing. So I have to do skipmissing
first, then check whether it’s empty, and then call mean
. Is there a way to avoid the branch?
FWIW, R returns NaN
in this case.
1 Like
I don’t think there is a good solution for this at the moment, unfortunately.
NaN
isn’t a great return type in this context, since mean
might not just apply to numbers. You can take the mean of a Vector
of Vector
s, for example.
Maybe you can fix this upstream?
julia> t = Union{Float64, Missing}[missing, missing, missing]
3-element Vector{Union{Missing, Float64}}:
missing
missing
missing
julia> mean(skipmissing(t))
NaN
6 Likes
Would this work?
y=[missing missing]
isempty(begin x=skipmissing(y) end) ? x=NaN : x=mean(x)
NaN
y=[1 3]
isempty(begin x=skipmissing(y) end) ? x=NaN : x=mean(x)
2.0
1 Like
Thanks! So
mean(skipmissing(Union{Float64,Missing}[missing]))
> NaN
but
mean(skipmissing(Union{Missing}[missing]))
gives the error. So I should make sure to set a union type before calling mean, and it should work. Great!
Regarding the return value – ideally it would be missing
, no?
1 Like
No, I don’t think the return value would be missing
. The return value should be the same as mean(Float64[])
.
I guess so. Or maybe when you read in the data, you should just make sure that if a column is all missing, julia knows that those values could be Float64
s.
3 Likes
Got it. That makes sense.
The vector is simulation output that I create myself, so I’ll just make sure that there is a type set when I create the vector. Thanks again for the quick help!