Why are missing values not ignored by default?

jules · November 30, 2023, 5:59pm

Instead of smean or so, I’ve often thought that Julia could still use the question mark better. Currently, I think it’s only allowed for the ternary operator, but there it needs a space. So it could still be used as a unary operator. Or allowed as the ending for variable names in a future Julia version.

Maybe mean?(data) could be mean(skipmissing(data)) then. Or the missing-skipping version of mean. The thing is that this cannot be tried out in packages because current parsing rules don’t allow it.

Benny · November 30, 2023, 6:14pm

Adapting an unused (as far as I know, which is not a lot) prefix unary operator to save writing parentheses, it’s the shortest I can think of:

julia> const ¬ = skipmissing;

julia> using Statistics

julia> mean(¬[1, 2, missing, 3])
2.0

Though I would prefer const sm = skipmissing and write the parentheses or even a trailing |>sm, the tab completion \neg<TAB> doesn’t save typing anyway.

I don’t think the parsing rules can be adapted to let mean? be shorthand for function composition without breaking. We do have predicate operators e.g. !isnan, but with the precedence as is, !isnan(x) actually does !(isnan(x)), so !isnan would need parentheses if not an argument in a higher order function. Better put the character in front of data.

mkitti · November 30, 2023, 6:15pm

Ruby has ! suffixes that work similar to those in Julia.
Ruby also has ? suffixes, but they mean the method returns a boolean.

https://docs.ruby-lang.org/en/2.0.0/syntax/methods_rdoc.html#label-Method+Names

I believe Julia’s parsing considers the possibility that this may be valid in the future since it currently requires a space before a question mark.

julia> mean?(5)
ERROR: syntax: space required before "?" operator

jules · November 30, 2023, 6:30pm

Right that’s what I meant. The ? hasn’t gotten any meaning yet because nobody could decide on anything. Whether it should be something for nothing or for missing or for both. But at least it offers a possibility that could be explored.

CameronBieganek · November 30, 2023, 6:35pm

Yeah, I think it was considered as an option where

f?(x)

would be equivalent to

passmissing(f)(x)

In that case, it’s not really meant for reductions like mean, sum, and cor, which are the functions that seem to ruffle the most feathers in this thread. But perhaps some overloading of broadcasting could make

f?.(x)

equivalent to

f(skipmissing(x))

In fact, we could probably implement this already with a Unicode binary operator…

EDIT: Maybe two separate Unicode operators would be better. Not sure that broadcasting ? has the right semantics.

EDIT #2: I wish we had more unary post-fix operators, like '. If you use a binary operator, then something like the following won’t work, because we don’t have frankentuples:

f ⍰ (a; b=10)

pdeffebach · November 30, 2023, 7:16pm

I think the rules would be something along the lines of

If f operates on scalars, then f?(x, y) would return missing if x or y are missing.
If f operates on collections, then f?(x) would omit missings from the collection x.

Of course, narrowing these rules and implementing them would be super hard. I wonder if a lack of strictness in the type hierarchy also contributes. Julia doesn’t have a strong notion of collection that could help make those rules clearer.

But it’s something DataFramesMeta.jl can think more about and experiment with, particularly because in the context of DataFrames the collection-scalar distinction is well identified with ByRow and the only “collection” is vectors.

CameronBieganek · November 30, 2023, 7:21pm

Parsing and lowering know nothing about what f does, so it needs to be a syntactic transformation. So, two operators would probably be needed. Something along these lines:

?

f?

translates to

passmissing(f)

??

f??(args...; kwargs...)

translates to something similar to

f(skipmissings(args...)...; kwargs...)

BioTurboNick · November 30, 2023, 7:23pm

If someone was so inspired, perhaps they could use what I did in OverflowContexts.jl to swap out the integer operator definitions.

You’d write a macro that swaps out all Missing-related operators/methods to have the permissive behaviors. Perhaps @default_skipmissings or something like that. Then you’d have a one-liner you could write and work the way you want. Would take a beat to recompile, but if you’re doing something interactive that is a fairly small one-time cost to avoid cluttering the code with guardrails for missings.

CameronBieganek · November 30, 2023, 7:38pm

Actually, to clarify, we would prefer to have ?? be a (unary, postfix) higher order function so that we can write things like this:

combine(df, :x => mean??)

So, a better specification than my original would be something like this:

f??

translates to something similar to

(args...; kwargs...) -> f(skipmissings(args...)...; kwargs...)

I guess one could debate about which of ? and ?? should be skipmissings and which should be passmissing.

CameronBieganek · November 30, 2023, 7:47pm

If we could actually get f? and f?? added to the language, that would be a very positive outcome from this rather long and winding discussion.

@adienes @alfaromartino How do you feel about writing mean??(x) and cor??(x, y)? Would that make missing handling in Julia feel more ergonomic to you?

I suspect that skipmissing and skipmissings get used more often than passmissing, so maybe we should switch it around so that it would be this:

mean?(x)
cor?(x, y)
tryparse??(Int, x)

CameronBieganek · November 30, 2023, 7:59pm

Actually, we don’t need parsing/lowering to do a syntax transformation. We just need ? and ?? to be parsed as unary post-fix operators.

dlakelan · November 30, 2023, 8:15pm

could also probably use an existing unicode operator? I can’t seem to find a list of them.

CameronBieganek · November 30, 2023, 8:18pm

There’s almost nothing for unary postfix operators. You can attach a superscript to ', but that’s annoying to type and kinda ugly to read.

jar1 · November 30, 2023, 8:23pm

We might want to use such important syntax for something more general rather than the domain-specific passmissing. For example, error handling, as is done in other languages. We need to be really careful when spending valuable syntax resources.
f?? being different from (f?)? isn’t ideal.

CameronBieganek · November 30, 2023, 8:26pm

That’s been the argument in the relevant GitHub issue for years now, which has prevented ? from ever being added for any use whatsoever.

True. Perhaps there’s another symbol out there that we can use for the second unary post-fix operator… Although I’m not sure it would be a dealbreaker.

aplavin · November 30, 2023, 9:15pm

And that’s not bad, if after some time we get a solution that also incorporates handling exceptions and other errors! Agree with @jar1.

While for missing handling specifically, a very simple solution suggested multiple times in this thread alone would already bring significant improvement to those struggling with the current situation. Just define smean or whatever functions in a missing-focused package!

davidanthoff · November 30, 2023, 9:26pm

One option with new syntax is to have the parser handle it gracefully, but not actually make it valid regular Julia syntax. Macros can then use that syntax for their domain specific use-case. That can sometimes strike a balance between using up scarce syntax options for a domain specific use-case and still enabling domain-specific short syntax options.

The handling of curely brackets is a nice prior example, they don’t do anything at the top level in regular Julia code at the moment, but macros can utilize them, which has been super useful both in Query and Vega/VegaLite for example.

CameronBieganek · November 30, 2023, 9:50pm

I’ll admit that having a post-fix ? operator defined as

?(f) = (args...; kwargs...) -> f(skipmissings(args...)...; kwargs...)

is not as universally useful as one might hope. It works for the univariate reduction functions (mean, sum, etc) and for cor and var, but it generally doesn’t work for other multivariate function calls, like

quantile?(xs, [0.1, 0.2])

or

reduce?(+, xs)

Storopoli · November 30, 2023, 10:00pm

I think that increasing data/stats namespace is not ideal here.

I really like this idea of the ? suffix.
Is very “Julian” in the sense that it is expressive, concise, and powerful.
Similar to the ! suffix.

danielwe · November 30, 2023, 10:32pm

Perhaps it would be more useful to make this DataFrames.jl-specific and apply it at the level of data frame column selection? Something like

combine(df, [:col1?, :col2?] => cor)

Topic		Replies	Views
What workflows for missing values are more ergonomic in Julia? Internals & Design	2	363	November 30, 2023
Compute mean of array where all values could be missing New to Julia	5	391	April 21, 2021
DataFrames, aggregate with missings Data dataframes	2	560	May 4, 2020
Using `isnan()` with missing values leads to hard to find bugs General Usage	6	516	April 12, 2020
Missing of a certain data type General Usage	5	485	February 15, 2019

Why are missing values not ignored by default?

?

??

Related topics