Hi, all, we started talking about some issues about dropping missing automatically, in the discussion it became clear that some people tend not to experience the same level of non-ergonomic issues as others.
What workflow recommendations can we come up with that we could offer to new users to make Julia’s existing missing handling be as helpful as possible? What kinds of things should they avoid? Imagine we are writing some course notes for a data analysis undergrad class or something…
Writing skipmissing everywhere sounds annoying. But I think a package should do it, and not overwrite the default function. It could be mean*(vec) or whatever.
But when I deal with missings, usually they contain information. So have mean*() return the mean optionally along with the number/proportion of missings and invalids.
Ie:
mean([11, 2, 1, 10, missing]) is not the same to me as mean([missing, 2, missing, 10, missing]), which is different from mean([missing, 2, missing, 10, "6"])
I just tend to make a shorter alias like sm and write mean(sm(vec)), maybe alias a composition of a function with sm if it happens enough. I could never make an “automatic” skipmissing work because I don’t always want to skipmissing, so I might as well make the case-by-case basis shorter to write. If I have to exclude rows with missings across multiple select columns, lazy dropmissing on the overall dataframe it is.
In that thread, I used 1 higher order function to refactor a set of variant scalar operations with 1 particular way of replacing the original operation with false and propagating missing for the other. I should note that as scalar operations, they do not do anything like skipping missings, I’m just mentioning that higher order functions can be used here too.