What workflows for missing values are more ergonomic in Julia?

dlakelan · November 30, 2023, 3:00am

Continuing the discussion from Why are missing values not ignored by default?:

Hi, all, we started talking about some issues about dropping missing automatically, in the discussion it became clear that some people tend not to experience the same level of non-ergonomic issues as others.

What workflow recommendations can we come up with that we could offer to new users to make Julia’s existing missing handling be as helpful as possible? What kinds of things should they avoid? Imagine we are writing some course notes for a data analysis undergrad class or something…

Tetrakai · November 30, 2023, 4:27pm

Writing skipmissing everywhere sounds annoying. But I think a package should do it, and not overwrite the default function. It could be mean*(vec) or whatever.

But when I deal with missings, usually they contain information. So have mean*() return the mean optionally along with the number/proportion of missings and invalids.

Ie:

mean([11, 2, 1, 10, missing]) is not the same to me as mean([missing, 2, missing, 10, missing]), which is different from mean([missing, 2, missing, 10, "6"])

Benny · November 30, 2023, 5:59pm

I just tend to make a shorter alias like sm and write mean(sm(vec)), maybe alias a composition of a function with sm if it happens enough. I could never make an “automatic” skipmissing work because I don’t always want to skipmissing, so I might as well make the case-by-case basis shorter to write. If I have to exclude rows with missings across multiple select columns, lazy dropmissing on the overall dataframe it is.

In that thread, I used 1 higher order function to refactor a set of variant scalar operations with 1 particular way of replacing the original operation with false and propagating missing for the other. I should note that as scalar operations, they do not do anything like skipping missings, I’m just mentioning that higher order functions can be used here too.

Topic		Replies	Views
Why are missing values not ignored by default? Internals & Design data , missing-values	330	8265	January 17, 2024
Statistics.mean() function with a Matrix containing missing values New to Julia	8	1158	February 6, 2023
Operations on missing values General Usage question	8	1386	March 19, 2018
Rationale for dropmissing vs skipmissing General Usage question , dataframes	2	141	August 24, 2024
Compute mean of array where all values could be missing New to Julia	5	392	April 21, 2021

What workflows for missing values are more ergonomic in Julia?

Related topics