Some useful comments are posted after the closed issue Fail to ignore NaN when calculating mean of an array #4552. Note the appropriate place for comments is here in Discourse. So to move the thread here:
@timholy’s Images.jl and
meanfinite have fast, flexible dimensioned averages that ignore non-finite values.
I have reworked
condmean in ConditionalMean to accumulate subject to an arbitrary condition (which could be to ignore sentinel values, i.e.
NaN, -999). It also can average a callable function of the array.
nanstd() methods are provided. Tests and pull requests are welcome.
Now this devolves into a ease-of-use complaint: The unsettledness of standard(s) for how to handle missing data is an impediment to developing useful functions and methods. The long list of breaking, late-breaking, and deprecated ways to do this include
Null, Unions thereof, and sentinel values for numeric types (e.g.
This diversity and changing architecture leads to severe usability problems. Searching the discussions, I find different computer science and data science philosophies and practical reasons for one approach over another. I respect these arguments, but it’s
impossible hard to tell what works, what’s supported, deprecated, or broken.